CN113766285A - Volume control method, television and storage medium - Google Patents

Volume control method, television and storage medium Download PDF

Info

Publication number
CN113766285A
CN113766285A CN202010491666.7A CN202010491666A CN113766285A CN 113766285 A CN113766285 A CN 113766285A CN 202010491666 A CN202010491666 A CN 202010491666A CN 113766285 A CN113766285 A CN 113766285A
Authority
CN
China
Prior art keywords
user
voice
television
communication equipment
volume
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010491666.7A
Other languages
Chinese (zh)
Inventor
陈小平
于显双
蔡钧锴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunmi Internet Technology Guangdong Co Ltd
Original Assignee
Yunmi Internet Technology Guangdong Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunmi Internet Technology Guangdong Co Ltd filed Critical Yunmi Internet Technology Guangdong Co Ltd
Priority to CN202010491666.7A priority Critical patent/CN113766285A/en
Publication of CN113766285A publication Critical patent/CN113766285A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/4104Peripherals receiving signals from specially adapted client devices
    • H04N21/4126The peripheral being portable, e.g. PDAs or mobile phones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42201Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] biosensors, e.g. heat sensor for presence detection, EEG sensors or any limb activity sensors worn by the user
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/441Acquiring end-user identification, e.g. using personal code sent by the remote control or by inserting a card
    • H04N21/4415Acquiring end-user identification, e.g. using personal code sent by the remote control or by inserting a card using biometric characteristics of the user, e.g. by voice recognition or fingerprint scanning

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Neurosurgery (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The application relates to the field of intelligent household appliances, in particular to a volume control method, a television and a storage medium, wherein the method comprises the following steps: if the fact that a user brings a communication device close to the head of the user is detected, acquiring a voice signal input into the communication device by the user; determining voice information of the user according to the voice signal, and acquiring a plurality of word segments in the voice information; and if the plurality of word segments in the voice message have preset keywords, determining that the user is in a call, and controlling the television to reduce the playing volume. By detecting that the user is in a call, the playing volume of the television is reduced, and the convenience of controlling the television by the user is improved.

Description

Volume control method, television and storage medium
Technical Field
The present application relates to the field of television technologies, and in particular, to a volume control method, a television, and a storage medium.
Background
With the continuous development and improvement of television technology, more and more people select the smart television to watch programs. But the playing volume of most smart televisions needs to be increased or decreased by the user through keys in the remote controller. On one hand, when a user answers a call, the user may not be in time to find the remote controller, the playing volume in the television cannot be reduced in time, and the operation is inconvenient; on the other hand, when the user is in a call, the playing volume in the television affects the call quality of the user, and the user experience cannot be improved due to insufficient intellectualization.
Disclosure of Invention
The application provides a volume control method, a television and a storage medium, which can reduce the playing volume of the television and improve the convenience of controlling the television by a user by detecting that the user is in a call.
In a first aspect, the present application provides a volume control method, applied to a television, the method including:
if the fact that a user brings a communication device close to the head of the user is detected, acquiring a voice signal input into the communication device by the user;
determining voice information of the user according to the voice signal, and acquiring a plurality of word segments in the voice information;
and if the plurality of word segments in the voice message have preset keywords, determining that the user is in a call, and controlling the television to reduce the playing volume.
In a second aspect, the present application further provides a television, where the television includes a three-dimensional structured light module, a voice collecting device, a memory, and a processor;
the three-dimensional structure optical module is used for detecting whether the user brings the communication equipment close to the head of the user or whether the user brings the communication equipment close to the mouth of the user;
the voice acquisition device is used for acquiring voice signals of a user;
the memory for storing a computer program;
the processor is configured to execute the computer program and implement the volume control method as described above when executing the computer program.
In a third aspect, the present application also provides a computer-readable storage medium storing a computer program, which when executed by a processor causes the processor to implement the volume control method as described above.
The application discloses a volume control method, a television and a storage medium, wherein when detecting that a user brings a communication device close to the head of the user, a voice signal input by the user into the communication device is acquired, so that whether the user is in communication or not can be judged according to the voice signal; by determining the voice information corresponding to the voice signal and acquiring a plurality of participles in the voice information, whether preset keywords exist or not can be judged according to the plurality of participles; when a plurality of word segments in the voice information have preset keywords, determining that a user is in a call; and then control the TV set and reduce the broadcast volume, the operation is more convenient and intelligent, improves the convenience that the user controlled the TV set.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic block diagram of a television according to an embodiment of the present application;
fig. 2 is a schematic block diagram of a television set provided by an embodiment of the present application;
FIG. 3 is a flow chart illustrating steps of a volume control method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a scenario in which a user answers a call according to an embodiment of the present application;
fig. 5 is a schematic diagram of collecting a voice signal by a telephony device according to an embodiment of the present application;
FIG. 6 is a schematic diagram of the recognition principles of a speech recognition model provided by an embodiment of the present application;
fig. 7 is a schematic view of a scenario in which a user brings a telephony device close to a mouth according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1, fig. 1 is a schematic block diagram of a television according to the present application. The television 10 in the embodiment of the present application will be described below with reference to fig. 1.
Illustratively, the television 10 may be an OLED television, an LED television, a curved-surface television, a full-screen television, a 3D television, a smart television, an ultra high definition UHD television, or the like.
The television 10 has a fully open platform, and is equipped with an operating system; the user can install and uninstall various application software by himself while enjoying the common television content, and continuously expand and upgrade the functions of the new television product, thereby continuously bringing rich personalized experience to the user.
As shown in fig. 1, the television 10 includes a three-dimensional structured light module 11 and a voice collecting device 12. The three-dimensional structure optical module 11 and the voice collecting device 12 may be disposed in a frame of the television 10.
Illustratively, the three-dimensional structured light module 11, i.e., a 3D (3Dimensions) structured light camera, includes a depth camera, a color camera, and a light source emitter; the depth camera is used for obtaining the distance between a measured object and the depth camera, the color camera is used for collecting the image of the measured object, and the light source emitter is used for projecting the structured light to the surface of the measured object. The three-dimensional structure optical module 11 can acquire a depth image of a measured object; the depth image includes three-dimensional position and size information of the object to be measured.
The existing imaging methods of the 3D camera include methods of structured light, binocular vision, and optical flight time. In the embodiment of the present application, the three-dimensional structured light module 11 acquires an image by using a structured light imaging method. It can be understood that the working principle of the structured light to collect the image is as follows: invisible infrared laser with specific wavelength is used as a light source, light sent by the light source is projected on a measured object through a set code, and the distortion of a returned code pattern is calculated through a certain algorithm to obtain the position and depth information of the measured object.
For example, the three-dimensional structured light module 11 may obtain depth information, three-dimensional size, and spatial information of an object in real time, and may be used in application scenarios such as motion capture and recognition, face recognition, three-dimensional modeling, indoor navigation and positioning, and the like.
By way of example, the speech acquisition device 12 may include, but is not limited to, a microphone array, a sound recorder, a recording pen, or other acoustic sensor. The microphone array is a system composed of a certain number of microphones and used for sampling and processing the spatial characteristics of a sound field. The microphone array has the functions of noise suppression, echo suppression, sound source positioning, gain adjustment and the like.
In the embodiment of the present application, the voice collecting device 12 employs a microphone array, which can effectively suppress most of the environmental noise and improve the definition of obtaining the voice signal of the user.
In fig. 1, the television 10 is exemplified by including the three-dimensional structured light module 11 and the voice capturing device 12, but the three-dimensional structured light module 11 and the voice capturing device 12 are limited.
Illustratively, monitoring is performed by the three-dimensional structured light module 11; if the three-dimensional structure optical module 11 detects that the user brings the communication equipment close to the head of the user, the voice acquisition device 12 is controlled to acquire the voice signal input by the user into the communication equipment. And determining whether the user is in a call according to the voice signal, and controlling the television 10 to reduce the playing volume if the user is in the call.
The communication can include two scenes of answering a call and making a call, and can also include a scene of inputting voice, voice chat, video chat and the like in instant chat software.
Referring to fig. 2, fig. 2 is a schematic block diagram of a television according to an embodiment of the present disclosure. In fig. 2, the television 10 includes a processor 101, a memory 102, a three-dimensional structured light module 103, and a voice collecting device 104; the processor 101, the memory 102, the three-dimensional structured light module 103, and the voice capture device 104 are connected by a bus, such as an I2C (Inter-integrated Circuit) bus.
The processor 101 is used to provide computing and control capabilities, among other things, to support the operation of the entire television 10.
The memory 102 may include a non-volatile storage medium and an internal memory. The non-volatile storage medium may store an operating system and a computer program. The computer program includes program instructions that, when executed, cause a processor to perform any of the volume control methods.
The three-dimensional structure optical module 103 is used for detecting whether the user brings the communication device close to the head of the user or whether the user brings the communication device close to the mouth of the user.
And the voice acquisition device 104 is used for acquiring a voice signal of the user and transmitting the voice signal to the processor 101 and the memory 102.
The Processor 101 may be a Central Processing Unit (CPU), or may be other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Wherein the processor 101 is configured to run a computer program stored in the memory 102, and when executing the computer program, implement the following steps:
if the fact that a user brings a communication device close to the head of the user is detected, acquiring a voice signal input into the communication device by the user; determining voice information of the user according to the voice signal, and acquiring a plurality of word segments in the voice information; and if the plurality of word segments in the voice message have preset keywords, determining that the user is in a call, and controlling the television to reduce the playing volume.
In some embodiments, a three-dimensional structured light module and a voice acquisition device are installed in the television; the processor realizes that if the processor detects that the user brings the communication equipment close to the head of the user, the processor acquires the voice signal input by the user into the communication equipment, and realizes that:
if the fact that the user brings the communication equipment close to the head of the user is detected according to the three-dimensional structure light module, generating a voice acquisition instruction, and controlling the voice acquisition device to acquire a voice signal input by the user to the communication equipment according to the voice acquisition instruction; or if the three-dimensional structure light module detects that the user brings the communication equipment close to the head of the user, generating a voice acquisition instruction, and acquiring a voice signal acquired by the communication equipment and input by the user to the communication equipment according to the voice acquisition instruction.
In some embodiments, the television is connected with the talking device through a communication module; the processor acquires the voice signal input by the user into the communication equipment by the communication equipment according to the voice acquisition instruction, and realizes that:
and sending the voice acquisition instruction to the communication equipment through the communication module so that the communication equipment acquires and returns the voice signal input by the user to the communication equipment according to the voice acquisition instruction.
In some embodiments, the processor, in effecting determining speech information of the user from the speech signal, effects:
carrying out noise reduction processing on the voice signal to obtain a noise reduction voice signal corresponding to the user; and determining the voice information corresponding to the user according to the noise reduction voice signal corresponding to the user based on the trained voice recognition model.
In some embodiments, the processor, in effecting obtaining the plurality of tokens in the speech information, effects:
and performing word segmentation processing on the voice information according to the trained word segmentation model to obtain a plurality of words corresponding to the voice information.
In some embodiments, the processor, when being configured to control the television to decrease the playback volume, is configured to:
generating a volume reduction instruction, and controlling the television to reduce the playing volume to a preset volume value according to the volume reduction instruction; or generating a mute instruction, and controlling the television to close the playing volume according to the mute instruction.
In some embodiments, a three-dimensional structured light module is installed in the television; the processor further implements:
if the fact that the user brings the communication equipment close to the mouth of the user is detected according to the three-dimensional structure light module, generating a voice acquisition instruction for acquiring voice signals; and if the voice signal input into the communication equipment by the user is acquired according to the voice acquisition instruction, controlling the television to reduce the playing volume.
In some embodiments, the processor, after enabling controlling the television to decrease the playback volume, further enables:
if the fact that the user moves the communication equipment away from the head of the user is detected, controlling the television to increase playing volume; or if the fact that the user moves the communication equipment away from the mouth of the user is detected, controlling the television to increase playing volume.
For ease of understanding, the volume control method provided by the embodiment of the present application will be described in detail below with reference to the television set in fig. 1 and 2. It should be noted that the television mechanism described above defines application scenarios of the volume control method provided in the embodiments of the present application.
Referring to fig. 3, fig. 3 is a schematic flowchart illustrating steps of a volume control method according to an embodiment of the present application. The volume control method can be applied to the television, and by detecting that the user is in a call, the playing volume of the television is reduced, so that the operation is more convenient, and the convenience of the user for controlling the television is improved.
As shown in fig. 3, the volume control method includes steps S10 through S30.
Step S10, if it is detected that the user brings the communication device close to the head of the user, acquiring a voice signal input by the user into the communication device.
By way of example, telephony devices may include, but are not limited to, electronic devices such as mobile handsets, telephones, tablets, and wearable devices.
In the embodiment of the application, the three-dimensional structure optical module and the voice acquisition device can be installed in the television. Wherein, three-dimensional structure optical module includes degree of depth camera, color camera and light source transmitter. The three-dimensional structured light module is used for capturing and recognizing the motion of a user. This motion is captured and recognized, for example, when the user brings the telephony device close to the user's head. The voice collecting means may be a microphone array for collecting a voice signal of the user.
Specifically, when a user watches a television program in front of a television, the user can be monitored in real time through the three-dimensional structured light module; for example, an act of detecting whether the user is bringing the telephony device closer to the head. When the motion of the user approaching the talking device to the head is captured and recognized, it indicates that the user wants to answer or make a call. As shown in fig. 4, fig. 4 is a schematic view of a scenario in which a user answers a call.
In some embodiments, if it is detected that the user brings the telephony device close to the user's head, acquiring the voice signal input by the user into the telephony device may include: if the fact that the user brings the communication equipment close to the head of the user is detected according to the three-dimensional structure light module, a voice acquisition instruction is generated; and controlling the voice acquisition device to acquire the voice signal input into the communication equipment by the user according to the voice acquisition instruction.
For example, when a user watches a television program in front of a television, a ring tone of a mobile phone carried at any time rings, and the user picks up the mobile phone to be close to the head or the ear to answer the call. The three-dimensional structure light module in the television detects the action of a user for answering a call and generates a voice acquisition instruction for controlling the microphone array to acquire sound. The television controls the microphone array to monitor according to the voice acquisition instruction so as to acquire voice signals input into a microphone of the mobile phone by a user.
It should be noted that, when the microphone array monitors the voice signal of the user, the playing sound in the television can be automatically filtered through a built-in algorithm, so as to avoid mixing the playing sound when the voice signal of the user is collected.
In other embodiments, if it is detected that the user brings the telephony device close to the head of the user, acquiring the voice signal input by the user into the telephony device may include: and if the fact that the user brings the communication equipment close to the head of the user is detected according to the three-dimensional structure light module, generating a voice acquisition instruction, and acquiring a voice signal, input by the user, of the communication equipment, acquired by the communication equipment according to the voice acquisition instruction.
The voice signal of the user input call equipment is acquired by the call equipment according to the voice acquisition instruction, and the voice acquisition instruction is sent to the call equipment through the communication module, so that the call equipment acquires and returns the voice signal of the user input call equipment according to the voice acquisition instruction. As shown in fig. 5, fig. 5 is a schematic diagram of collecting voice signals through a telephony device.
Illustratively, a communication module is arranged in the television, and the television can communicate with the communication equipment through the communication module.
Wherein, the communication module can include but is not limited to a bluetooth module, a Wi-Fi module, a 4G module, a 5G module, an NB-IoT module, a LoRa module, etc.
For example, when a user watches a television program in front of a television, the user picks up the mobile phone to dial and then places a call close to the head or the ear. A three-dimensional structure light module in the television detects the action of a user for answering a call and generates a voice acquisition instruction. The television sends the voice acquisition instruction to the mobile phone through the communication module so that the mobile phone acquires and returns a voice signal input by a user into a microphone of the mobile phone according to the voice acquisition instruction.
It is understood that when a user inputs voice to the telephony device, a handset in the telephony device converts the voice input by the user into an electrical signal; the telephony device can therefore transmit the electrical signal in the handset to the television set.
Whether the user brings the communication equipment close to the head of the user or not is detected through the three-dimensional structure light module, whether the user is in communication or not can be determined, and the method is more accurate and intelligent; when a user calls, the voice acquisition device is controlled to acquire the voice signal of the user.
And step S20, determining the voice information of the user according to the voice signal, and acquiring a plurality of word segments in the voice information.
Specifically, a voice signal of a user is converted into voice information, and then word segmentation processing is performed on the voice information to obtain a plurality of word segments.
In some embodiments, determining the voice information of the user from the voice signal may include: carrying out noise reduction processing on the voice signals to obtain noise reduction voice signals corresponding to users; and determining the voice information corresponding to the user according to the noise reduction voice signal corresponding to the user based on the trained voice recognition model.
Specifically, the noise reduction processing is performed on the voice signal of the user to obtain a noise reduction voice signal corresponding to the user.
For example, the noise reduction processing of the voice signal of the user can be realized according to a spectral subtraction algorithm, a wiener filtering algorithm, a minimum average error algorithm and a wavelet transformation algorithm, so as to obtain a noise-reduced voice signal of the user.
It should be noted that, the voice signal of the user collected by the television may be mixed with noise, and by performing noise reduction processing on the voice signal, most of the noise can be filtered, so that the useful information of the user is retained, and the accuracy of subsequently recognizing the voice information corresponding to the voice signal is improved.
Specifically, based on a trained speech recognition model, before determining speech information corresponding to a noise reduction speech signal, preprocessing the noise reduction speech signal is required to obtain preprocessed speech data; and then, voice characteristic parameter extraction is carried out on the preprocessed voice data to obtain voice characteristic data corresponding to the voice signal. Wherein the preprocessing comprises pre-emphasis processing, framing processing and windowing processing. The speech feature data comprises mel-frequency cepstral feature vectors.
Specifically, the process of extracting the speech feature parameters may include: performing fast Fourier transform processing and squaring processing on the preprocessed voice data to obtain spectral line energy corresponding to the preprocessed voice data; processing the spectral line energy based on a Mel filter group to obtain Mel frequency spectrum data corresponding to the preprocessed voice data; and carrying out cepstrum analysis on the Mel frequency spectrum data and carrying out first-order difference and second-order difference on the result of the cepstrum analysis to obtain a Mel cepstrum feature vector corresponding to the voice signal.
Wherein the mel filter bank comprises a plurality of filters; cepstral analysis may include taking a logarithm and a Discrete Cosine Transform (DCT).
Specifically, based on the trained speech recognition model, the speech information corresponding to the speech signal is determined according to the speech feature data corresponding to the speech signal.
The trained speech recognition model can include a trained acoustic model, a trained language model, a dictionary, a decoder and other modules.
By way of example, the acoustic models may include, but are not limited to, hidden markov models, convolutional neural networks, constrained boltzmann machines, recurrent neural networks, long-and-short term memory networks, and the like.
Specifically, the speech feature data is input into a trained acoustic model, and phoneme information corresponding to the speech feature data is output. The trained acoustic model can be obtained by training the initial acoustic model to converge by using the voice in the preset voice database.
Illustratively, the language model may comprise a hidden markov model. Inputting the text information into the trained language model, and outputting the probability of the correlation of the single character or word. The trained language model can be obtained by training the initial language model to be converged by using the text information in the preset text database.
Illustratively, a dictionary includes a correspondence of words or phrases to phonemes. For example, the correspondence between pinyin and chinese characters; phonetic symbol and word correspondence.
The decoding means performing text output on the speech feature data through the trained acoustic model, the trained language model, and the dictionary. Fig. 6 is a schematic diagram of the recognition principle of the speech recognition model, as shown in fig. 6.
In some embodiments, the speech feature data corresponding to the speech signal of the user is input into the trained speech recognition model, so as to obtain the speech information corresponding to the user. For example, the voice message may be "give you a good ask when you have time".
The trained speech recognition model can be obtained by training the acoustic model and the language model. The speech information corresponding to the noise-reduction speech signal is determined based on the trained speech recognition model, so that the recognition accuracy can be improved.
In some embodiments, obtaining the plurality of participles in the voice information may include: and performing word segmentation processing on the voice information according to the trained word segmentation model to obtain a plurality of words corresponding to the voice information.
For example, the trained segmentation model may include a BI _ LSTM-CRF neural network model. And performing word segmentation training by using the initial word segmentation model to obtain a trained word segmentation model.
It should be noted that the BI _ LSTM-CRF neural network model combines the BI _ LSTM network and the CRF (conditional Random field) layer. The BI _ LSTM-CRF neural network model can not only use the features and statement label information input in the past, but also use the input features in the future, and can ensure higher accuracy of Chinese word segmentation by considering the influence of long-distance context information on Chinese word segmentation.
The first layer of the BI _ LSTM-CRF neural network model is a look-up layer, and each word in a sentence is mapped into a low-dimensional dense word vector from one-hot vectors by utilizing a pre-trained or randomly initialized embedding matrix; dropout may be set to mitigate overfitting before the next layer is input. The second layer is a bi-directional LSTM layer for automatic sentence feature extraction. The third layer is a CRF layer and is used for sentence-level sequence marking.
In some embodiments, word segmentation processing is performed on the voice message "when you have time to ask you for you" according to the trained word segmentation model, so that a plurality of word segments corresponding to the voice message are obtained; for example { feed \ hello \ ask \ what \ time }.
The trained word segmentation model is used for carrying out word segmentation processing on the voice information, so that word segmentation can be more accurate.
Step S30, if there is a preset keyword in the multiple word segments in the voice message, determining that the user is talking, and controlling the television to reduce the playing volume.
Exemplarily, the preset keywords refer to words commonly used in making a call; keywords may include, but are not limited to: { feeding, hello, asking for questions, morning good, eating }.
Specifically, a plurality of segmented words in the voice message may be matched with a preset keyword, and if the same word is matched, it may be determined that the user is talking. For example, matching the same words is "feed" and/or "hello".
It will be appreciated that when a call is placed or made, the user's first sentence is typically "good you; therefore, when the words such as "feed \ hello" exist in the voice message, the user can be determined to be in the passing state.
By matching a plurality of segmented words in the voice message with preset keywords, whether the user is in a call can be determined.
In some embodiments, after determining that the user is talking, a volume-down command may be generated, and the television may be controlled to decrease the playing volume to the preset volume value according to the volume-down command.
The preset volume value may be determined according to actual conditions, and the specific value is not limited herein.
By reducing the playing volume to the preset volume value, the interference of the playing sound to the conversation of the user can be reduced to the maximum limit, and the conversation quality of the user is improved.
In some embodiments, after determining that the user is talking, a mute instruction may be generated, and the television may be controlled to turn off the playing volume according to the mute instruction. By turning off the playback volume, the influence of the playback volume on the user's call can be eliminated.
In some embodiments, whether the user brings the talking device close to the mouth of the user can be detected through the three-dimensional structural light module; if it is detected that the user brings the call device close to the mouth of the user, it can be determined that the user is calling. Fig. 7 is a view showing a scene in which the user brings the call device close to the mouth, as shown in fig. 7.
It can be understood that when watching television programs, users often chat with other people in the instant chat software in the mobile phone; when inputting voice, the mobile phone is generally close to the mouth to clearly input the voice into the mobile phone. When a user inputs voice in the mobile phone, the voice in the television program is often mixed, so that the opposite party cannot hear the voice of the user clearly; therefore, when the user inputs voice in the mobile phone, the playing volume in the television needs to be reduced.
Specifically, if the fact that the user brings the communication equipment close to the mouth of the user is detected according to the three-dimensional structure light module, a voice acquisition instruction for acquiring voice signals is generated; and if the voice signal input into the communication equipment by the user is acquired according to the voice acquisition instruction, controlling the television to reduce the playing volume.
It is understood that when the user brings the communication device close to the user's mouth and the user inputs voice in the communication device, it can be determined that the user is communicating and the playing volume in the television needs to be reduced.
Specifically, after the user finishes the call, the television can be controlled to increase the playing volume.
For example, if it is detected that the user moves the telephony device away from the user's head, the television is controlled to increase the playing volume.
For example, if it is detected that the user moves the telephony device away from the mouth of the user, the television is controlled to increase the playing volume.
For example, whether the user moves the telephony device away from the head or mouth of the user can be detected by the three-dimensional structured light module. It will be appreciated that when the user moves the telephony device away from the head or mouth, it is an indication that the user is done speaking.
By determining that the user is in a call, the television is controlled to reduce or close playing sound, so that the operation is more convenient and intelligent; after the user finishes the conversation, controlling the television to increase the playing volume; the convenience of controlling the television by the user is improved.
According to the volume control method provided by the embodiment, whether the user brings the communication equipment close to the head of the user is detected through the three-dimensional structure light module, whether the user is in communication can be determined, and the method is more accurate and intelligent; when a user calls, the voice acquisition device is controlled to acquire a voice signal of the user; by carrying out noise reduction processing on the voice signals, most of noise can be filtered, useful information of a user is reserved, and accuracy of voice information corresponding to the voice signals is improved; the voice information corresponding to the noise-reduction voice signal is determined based on the trained voice recognition model, so that the recognition accuracy can be improved; the trained word segmentation model is used for carrying out word segmentation processing on the voice information, so that word segmentation can be more accurate; by matching a plurality of participles in the voice information with preset keywords, whether the user is in a call can be determined; by determining that the user is in a call, the television is controlled to reduce or close playing sound, so that the operation is more convenient and intelligent; after the user finishes the conversation, controlling the television to increase the playing volume; the convenience of controlling the television by the user is improved.
The embodiment of the application further provides a computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, the computer program comprises program instructions, and the processor executes the program instructions to realize any volume control method provided by the embodiment of the application. For example, the computer program is loaded by a processor and may perform the following steps:
if the fact that a user brings a communication device close to the head of the user is detected, acquiring a voice signal input into the communication device by the user; determining voice information of the user according to the voice signal, and acquiring a plurality of word segments in the voice information; and if the plurality of word segments in the voice message have preset keywords, determining that the user is in a call, and controlling the television to reduce the playing volume.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
The computer-readable storage medium may be an internal storage unit of the television set described in the foregoing embodiment, for example, a hard disk or a memory of the television set. The computer readable storage medium may also be an external storage device of the television, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital Card (SD), a Flash memory Card (Flash Card), and the like, which are provided on the television.
While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A volume control method is applied to a television, and is characterized by comprising the following steps:
if the fact that a user brings a communication device close to the head of the user is detected, acquiring a voice signal input into the communication device by the user;
determining voice information of the user according to the voice signal, and acquiring a plurality of word segments in the voice information;
and if the plurality of word segments in the voice message have preset keywords, determining that the user is in a call, and controlling the television to reduce the playing volume.
2. The volume control method according to claim 1, wherein a three-dimensional structured light module and a voice acquisition device are installed in the television; if the fact that the user brings the communication equipment close to the head of the user is detected, acquiring a voice signal input into the communication equipment by the user comprises the following steps:
if the fact that the user brings the communication equipment close to the head of the user is detected according to the three-dimensional structure light module, generating a voice acquisition instruction, and controlling the voice acquisition device to acquire a voice signal input by the user to the communication equipment according to the voice acquisition instruction; or
And if the fact that the user brings the communication equipment close to the head of the user is detected according to the three-dimensional structure light module, generating a voice acquisition instruction, and acquiring a voice signal, input by the user, of the communication equipment, acquired by the communication equipment according to the voice acquisition instruction.
3. The volume control method according to claim 2, wherein the television is connected to the telephony device via a communication module; the acquiring, according to the voice acquisition instruction, the voice signal that the call device acquires the user input to the call device includes:
and sending the voice acquisition instruction to the communication equipment through the communication module so that the communication equipment acquires and returns the voice signal input by the user to the communication equipment according to the voice acquisition instruction.
4. The volume control method of claim 1, wherein said determining the voice information of the user from the voice signal comprises:
carrying out noise reduction processing on the voice signal to obtain a noise reduction voice signal corresponding to the user;
and determining the voice information corresponding to the user according to the noise reduction voice signal corresponding to the user based on the trained voice recognition model.
5. The volume control method of claim 1, wherein the obtaining the plurality of segmented words in the voice message comprises:
and performing word segmentation processing on the voice information according to the trained word segmentation model to obtain a plurality of words corresponding to the voice information.
6. The volume control method of claim 1, wherein the controlling the television to reduce the playing volume comprises:
generating a volume reduction instruction, and controlling the television to reduce the playing volume to a preset volume value according to the volume reduction instruction; or
And generating a mute instruction, and controlling the television to close the playing volume according to the mute instruction.
7. The volume control method according to claim 1, wherein a three-dimensional structured light module is installed in the television; the method further comprises the following steps:
if the fact that the user brings the communication equipment close to the mouth of the user is detected according to the three-dimensional structure light module, generating a voice acquisition instruction for acquiring voice signals;
and if the voice signal input into the communication equipment by the user is acquired according to the voice acquisition instruction, controlling the television to reduce the playing volume.
8. The volume control method according to any one of claims 1 to 7, wherein after controlling the television to reduce the playing volume, the method further comprises:
if the fact that the user moves the communication equipment away from the head of the user is detected, controlling the television to increase playing volume; or
And if the fact that the user keeps the communication equipment away from the mouth of the user is detected, controlling the television to increase playing volume.
9. A television is characterized by comprising a three-dimensional structured light module, a voice acquisition device, a memory and a processor;
the three-dimensional structure light module is used for detecting whether the user brings the communication equipment close to the head of the user or whether the user brings the communication equipment close to the mouth of the user;
the voice acquisition device is used for acquiring voice signals of a user;
the memory is used for storing a computer program;
the processor for executing the computer program and implementing the volume control method according to any one of claims 1 to 8 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to implement the volume control method according to any one of claims 1 to 8.
CN202010491666.7A 2020-06-02 2020-06-02 Volume control method, television and storage medium Pending CN113766285A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010491666.7A CN113766285A (en) 2020-06-02 2020-06-02 Volume control method, television and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010491666.7A CN113766285A (en) 2020-06-02 2020-06-02 Volume control method, television and storage medium

Publications (1)

Publication Number Publication Date
CN113766285A true CN113766285A (en) 2021-12-07

Family

ID=78782947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010491666.7A Pending CN113766285A (en) 2020-06-02 2020-06-02 Volume control method, television and storage medium

Country Status (1)

Country Link
CN (1) CN113766285A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104683847A (en) * 2015-02-04 2015-06-03 四川长虹电器股份有限公司 Television capable of controlling volume intelligently and method
CN204903983U (en) * 2015-08-21 2015-12-23 杨珊珊 Smart home systems and unmanned vehicles , intelligent maincenter equipment thereof
WO2018028360A1 (en) * 2016-08-08 2018-02-15 深圳光启合众科技有限公司 Control method and device for smart robot, and robot
CN108958490A (en) * 2018-07-24 2018-12-07 Oppo(重庆)智能科技有限公司 Electronic device and its gesture identification method, computer readable storage medium
CN109147820A (en) * 2018-08-30 2019-01-04 深圳市元征科技股份有限公司 Vehicle audio control method, device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104683847A (en) * 2015-02-04 2015-06-03 四川长虹电器股份有限公司 Television capable of controlling volume intelligently and method
CN204903983U (en) * 2015-08-21 2015-12-23 杨珊珊 Smart home systems and unmanned vehicles , intelligent maincenter equipment thereof
WO2018028360A1 (en) * 2016-08-08 2018-02-15 深圳光启合众科技有限公司 Control method and device for smart robot, and robot
CN108958490A (en) * 2018-07-24 2018-12-07 Oppo(重庆)智能科技有限公司 Electronic device and its gesture identification method, computer readable storage medium
CN109147820A (en) * 2018-08-30 2019-01-04 深圳市元征科技股份有限公司 Vehicle audio control method, device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
覃朗: ""基于 Windows CE 的智能家居终端的设计与实现"" *

Similar Documents

Publication Publication Date Title
US11823679B2 (en) Method and system of audio false keyphrase rejection using speaker recognition
CN107799126A (en) Sound end detecting method and device based on Supervised machine learning
CN109360549B (en) Data processing method, wearable device and device for data processing
CN107945806B (en) User identification method and device based on sound characteristics
CN114401417A (en) Live stream object tracking method and device, equipment and medium thereof
CN111883135A (en) Voice transcription method and device and electronic equipment
CN112331193A (en) Voice interaction method and related device
US20230317096A1 (en) Audio signal processing method and apparatus, electronic device, and storage medium
CN116129931B (en) Audio-visual combined voice separation model building method and voice separation method
CN113033245A (en) Function adjusting method and device, storage medium and electronic equipment
US11354520B2 (en) Data processing method and apparatus providing translation based on acoustic model, and storage medium
WO2022147692A1 (en) Voice command recognition method, electronic device and non-transitory computer-readable storage medium
CN112820300A (en) Audio processing method and device, terminal and storage medium
CN116108176A (en) Text classification method, equipment and storage medium based on multi-modal deep learning
CN113766285A (en) Volume control method, television and storage medium
CN114863916A (en) Speech recognition model training method, speech recognition device and storage medium
CN110839169B (en) Intelligent equipment remote control device and control method based on same
CN116959438A (en) Method for waking up device, electronic device and storage medium
CN109102810B (en) Voiceprint recognition method and device
CN111091807A (en) Speech synthesis method, speech synthesis device, computer equipment and storage medium
CN112740219A (en) Method and device for generating gesture recognition model, storage medium and electronic equipment
CN113573143B (en) Audio playing method and electronic equipment
US20220165263A1 (en) Electronic apparatus and method of controlling the same
CN113066513B (en) Voice data processing method and device, electronic equipment and storage medium
US20240185851A1 (en) Method and system of audio false keyphrase rejection using speaker recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211207

RJ01 Rejection of invention patent application after publication