CN112820291A - Intelligent household control method, system and storage medium - Google Patents

Intelligent household control method, system and storage medium Download PDF

Info

Publication number
CN112820291A
CN112820291A CN202110021302.7A CN202110021302A CN112820291A CN 112820291 A CN112820291 A CN 112820291A CN 202110021302 A CN202110021302 A CN 202110021302A CN 112820291 A CN112820291 A CN 112820291A
Authority
CN
China
Prior art keywords
voice signal
recognized
voice
voiceprint
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110021302.7A
Other languages
Chinese (zh)
Other versions
CN112820291B (en
Inventor
刘佐濂
郭子芊
陈江涛
宁琳
麦伟健
詹逸
向宇涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou University
Original Assignee
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou University filed Critical Guangzhou University
Priority to CN202110021302.7A priority Critical patent/CN112820291B/en
Publication of CN112820291A publication Critical patent/CN112820291A/en
Application granted granted Critical
Publication of CN112820291B publication Critical patent/CN112820291B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application discloses an intelligent home control method, an intelligent home control system and a storage medium, wherein the method comprises the steps of carrying out voice recognition and voiceprint recognition on a voice signal to be recognized, and extracting a control instruction and voiceprint characteristics; determining that the authorized user list does not include strangers, and acquiring a first voiceprint characteristic of the authorized user; performing feature matching on the first voiceprint feature and the voice voiceprint feature to obtain a first correlation coefficient; determining that the first correlation coefficient is greater than a threshold value, and generating a control signal; determining that the authorized user list comprises strangers, and acquiring a second voiceprint feature of an unauthorized user; performing feature matching on the second voiceprint feature and the voice voiceprint feature to obtain a second correlation number; and determining that the second correlation number is smaller than the threshold value, and generating a control signal. Compared with the existing intelligent home control method, the method and the system have the advantages that whether the user sending the voice signal has the operation authority or not can be identified, and the safety is higher. The application can be widely applied to the technical field of intelligent home.

Description

Intelligent household control method, system and storage medium
Technical Field
The present application relates to the field of smart home technologies, and in particular, to a smart home control method, system, and storage medium.
Background
The intelligent home is connected with various devices in the home through the Internet of things technology, and multiple functions of home appliance control, illumination control, telephone remote control, indoor and outdoor remote control, anti-theft alarm, environment monitoring, heating and ventilation control, infrared forwarding, programmable timing control and the like are achieved. Compared with the common home, the intelligent home has the traditional living function, has the functions of building, network communication, information appliance and equipment automation, can provide the omnibearing information interaction function, and even saves the fund for various energy expenses.
At present, the intelligent home is generally controlled to be in full-open permission, namely all speakers sending voice instructions to the intelligent home terminal can control the home. The full-open authority easily causes potential safety hazard due to child misoperation, and the safety of the existing intelligent home control is low.
Disclosure of Invention
In view of this, an object of the present application is to provide a method, a system and a storage medium for controlling smart home, so as to improve security of smart home control.
The first technical scheme adopted by the application is as follows:
a smart home control method includes:
collecting a voice signal to be recognized;
carrying out voice recognition on the voice signal to be recognized, and extracting a keyword of the voice signal to be recognized;
acquiring a control instruction corresponding to the keyword, and acquiring an authorized user list corresponding to the control instruction;
carrying out voiceprint recognition on the voice signal to be recognized, and extracting voice voiceprint characteristics of the voice signal to be recognized;
determining that the authorized user list does not include strangers, and acquiring a first voiceprint feature of the authorized user corresponding to the control instruction; performing feature matching on the first voiceprint feature and the voice voiceprint feature to obtain a first correlation coefficient; determining that the first correlation coefficient is larger than a matching threshold value, and generating an intelligent home control signal according to the keyword;
determining that the authorized user list comprises strangers, and acquiring a second voiceprint feature of an unauthorized user corresponding to the control instruction; performing feature matching on the second voiceprint feature and the voice voiceprint feature to obtain a second correlation number; determining that the second correlation number is smaller than a matching threshold value, and generating an intelligent home control signal according to the keyword;
wherein the stranger is a user who does not enter voiceprint features.
Further, the step of performing voiceprint recognition on the voice signal to be recognized and extracting the voice voiceprint feature of the voice signal to be recognized specifically includes:
carrying out high-pass filtering on the voice signal to be recognized to obtain a high-pass voice signal;
framing the high-pass voice signal to obtain a high-pass voice signal frame;
multiplying the high-pass voice signal frame by a Hamming window to obtain a Hamming voice signal frame;
performing fast Fourier transform on the Hamming voice signal frame to obtain a Hamming voice signal frame frequency spectrum;
triangular band-pass filtering is carried out on the Hamming voice signal frame frequency spectrum to obtain a triangular voice signal frame frequency spectrum;
performing discrete cosine transform on the frame frequency spectrum of the triangular voice signal to obtain a Mel frequency cepstrum coefficient;
and extracting the voice voiceprint characteristics of the voice signal to be recognized according to the Mel frequency cepstrum coefficient.
Further, the step of acquiring the voice signal to be recognized specifically includes:
collecting a wake-up voice signal;
and determining that the awakening voice signal comprises an awakening word, and collecting the voice signal to be recognized.
Further, the step of acquiring the voice signal to be recognized specifically includes:
collecting a voice signal to be identified;
performing voice framing on the voice signal to be recognized to obtain a plurality of voice signal frames to be recognized;
calculating the sum of squares of the voice signal frame to be identified to obtain the short-time energy of the voice signal frame to be identified;
and determining that the short-time energy is larger than an energy threshold value, and acquiring a voice signal to be recognized.
Further, the step of acquiring the voice signal to be recognized specifically includes:
collecting a voice signal to be identified;
performing voice framing on the voice signal to be recognized to obtain a plurality of voice signal frames to be recognized;
calculating the short-time average zero crossing rate of the voice signal frame to be identified;
and determining that the short-time average zero crossing rate is greater than a zero crossing rate threshold value, and collecting the voice signal to be recognized.
Further, the first correlation coefficient is a pearson correlation coefficient, and the second correlation coefficient is a pearson correlation coefficient.
Further, the step of acquiring the voice signal to be recognized specifically includes:
collecting unprocessed voice signals to be recognized;
and denoising the unprocessed voice signal to be recognized to obtain the voice signal to be recognized.
The second technical scheme adopted by the application is as follows:
an intelligent home control system, comprising:
the voice acquisition module is used for acquiring a voice signal to be recognized;
the voiceprint recognition module is used for carrying out voiceprint recognition on the voice signal to be recognized and extracting the voice voiceprint characteristics of the voice signal to be recognized;
the voice recognition module is used for carrying out voice recognition on the voice signal to be recognized and extracting a keyword of the voice signal to be recognized;
the authority acquisition module is used for acquiring a control instruction corresponding to the keyword and acquiring an authorized user list corresponding to the control instruction;
the authority matching module is used for determining that the authorized user list does not include strangers and acquiring a first voiceprint characteristic of the authorized user corresponding to the control instruction; performing feature matching on the first voiceprint feature and the voice voiceprint feature to obtain a first correlation coefficient; determining that the first correlation coefficient is larger than a matching threshold value, and generating an intelligent home control signal according to the keyword; determining that the authorized user list comprises strangers, and acquiring a second voiceprint feature of an unauthorized user corresponding to the control instruction; performing feature matching on the second voiceprint feature and the voice voiceprint feature to obtain a second correlation number; determining that the second correlation number is smaller than a matching threshold value, and generating an intelligent home control signal according to the keyword;
wherein the stranger is a user who does not enter voiceprint features.
The third technical scheme adopted by the application is as follows:
an intelligent home control system, comprising:
at least one processor;
at least one memory for storing at least one program;
when the at least one program is executed by the at least one processor, the at least one processor is enabled to realize the smart home control method.
The fourth technical scheme adopted by the application is as follows:
a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the smart home control method.
The method and the device for recognizing the voice signal of the smart home have the advantages that the voice signal to be recognized is collected to conduct voice recognition and voiceprint recognition, the keyword and the voiceprint feature of the voice signal to be recognized are obtained, the corresponding control command and the corresponding recorded voiceprint feature of the control command are obtained according to the keyword, the voiceprint feature and the recorded voiceprint feature of the voice signal are subjected to feature matching, whether the control authority exists in the voice signal to be recognized or not is judged, and the smart home control signal is generated after the control authority exists in the voice signal to be recognized. Compared with the existing intelligent home control method, the method and the system have the advantages that whether the user sending the voice signal has the operation authority or not can be identified, and the safety is higher.
Drawings
Fig. 1 is a flowchart of an intelligent home control method according to an embodiment of the present application.
Detailed Description
The conception, specific structure and technical effects of the present application will be described clearly and completely with reference to the following embodiments and the accompanying drawings, so that the purpose, scheme and effects of the present application can be fully understood.
The present application will now be described in further detail with reference to the accompanying drawings and specific examples. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art. Further, for several described in the following embodiments, it is denoted as at least one.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element of the same type from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present disclosure. The use of any and all examples, or exemplary language ("e.g.," such as "etc.), provided herein is intended merely to better illuminate embodiments of the application and does not pose a limitation on the scope of the application unless otherwise claimed.
Present intelligent house terminal gathers user's pronunciation and carries out speech recognition, draw the control information in the pronunciation through speech recognition, and with control information transmission to the treater in, the treater generates the control signal who corresponds according to control information, realize to intelligent house speech control, but present intelligent house terminal can not carry out the voiceprint recognition, lead to all users all to control intelligent house terminal, and children are if controlling intelligent house terminal under the condition that does not have guardian to accompany, cause the potential safety hazard easily.
As shown in fig. 1, an embodiment of the present application provides an intelligent home control method, including:
s110, collecting a voice signal to be recognized;
s120, carrying out voice recognition on the voice signal to be recognized, and extracting keywords of the voice signal to be recognized;
s130, acquiring a control instruction corresponding to the keyword, and acquiring an authorized user list corresponding to the control instruction;
s140, carrying out voiceprint recognition on the voice signal to be recognized, and extracting voice voiceprint characteristics of the voice signal to be recognized;
s150, determining that the authorized user list does not include strangers, and acquiring a first voiceprint characteristic of the authorized user corresponding to the control instruction; performing feature matching on the first voiceprint feature and the voice voiceprint feature to obtain a first correlation coefficient; determining that the first correlation coefficient is larger than a matching threshold value, and generating an intelligent home control signal according to the keyword;
s160, determining that the authorized user list comprises strangers, and acquiring a second voiceprint feature of an unauthorized user corresponding to the control instruction; performing feature matching on the second voiceprint feature and the voice voiceprint feature to obtain a second correlation number; and determining that the second correlation number is smaller than a matching threshold value, and generating an intelligent household control signal according to the keyword.
In the control process of the intelligent home, the intelligent home terminal can collect the voice signal to be recognized and perform voiceprint recognition and voice recognition on the voice signal to be recognized. Generally, a keyword of a speech signal to be recognized is recognized and extracted, after the keyword is recognized and obtained and a corresponding control instruction is obtained, an authorized user corresponding to the control instruction, that is, a user having authority over the operation instruction, is obtained, and meanwhile, voiceprint recognition needs to be performed on the speech signal to be recognized to obtain a voiceprint feature of the speech signal to be recognized. When the authorized user includes a stranger, the forbidden object of the control instruction is some members of family members, such as children and the old, at the moment, the voice voiceprint feature and the voiceprint feature corresponding to the children and the old in the family members, namely the second voiceprint feature of the unauthorized user are subjected to feature matching, and the control instruction can be executed if the unauthorized user is determined to be not sending the control instruction. When the authorized user does not include a stranger, the control instruction is indicated to be a limiting control instruction, only some members in the family can execute the control instruction, and at the moment, the voice voiceprint feature is matched with the voiceprint feature of the user with the authority in the family members, namely the first voiceprint feature of the user with the authority, and the control instruction can be executed after the authorized user sends the control instruction is determined.
In the process of voiceprint recognition of the voice signal to be recognized, considering security and recognition speed, a mel cepstrum coefficient can be used for extracting voiceprint features of the voice signal to be recognized. Compared with linear cepstrum analysis, the Mel cepstrum coefficient can better extract the voice print characteristics irrelevant to the text without depending on the hypothesis of a voice generation model, and better meets the requirement of not being restricted by specific commands. And after the characteristic extraction, carrying out matching verification on the identity in the database according to a related characteristic matching algorithm. The feature extraction needs to reduce the influence of information irrelevant to recognition in a voice signal and reduce the data amount required to be processed in a subsequent recognition stage, and besides extracting basic features capable of representing people from the voice signal, the extracted features need to be capable of effectively distinguishing different users and keeping relatively stable voice change of the same user. The voice characteristic parameter extraction firstly needs to pass a voice signal to be recognized through a high-pass filter to obtain a high-pass voice signal, and the high-pass filter is used for reducing information irrelevant to voiceprint recognition in the voice signal to be recognized and reducing data volume in a subsequent recognition stage. By boosting the high frequency part, the frequency spectrum of the speech signal to be recognized is flattened. Meanwhile, the influence of lips and vocal cords in the sounding process is eliminated, and the high-frequency part of the voice signal restrained by the sounding system is compensated. Then, the 256 sampling points are collected into an observation unit to obtain a high-pass voice signal frame. Since the input required for fourier transform is stationary, the sound signal is converted into a stationary signal frame by frame processing, so that the distribution of the lattice frequency components is obtained when fourier transform is performed. And multiplying the high-pass voice signal frame by a Hamming window to obtain a Hamming voice signal frame, and increasing the continuity of the left end and the right end of the frame by windowing to reduce the Gibbs effect so that the whole situation is more continuous. And then, carrying out fast Fourier transform on each frame to obtain energy distribution on a frequency spectrum, wherein the formula of the fast Fourier transform is as follows:
Figure BDA0002888673130000051
where s (k) is the Hamming speech signal frame image function after the fast Fourier transform, S (n) is the Hamming speech signal frame, and DFT [ ] is the fast Fourier transform. After obtaining the hamming speech signal frame image function, the energy distribution of s (k), i.e. the hamming speech signal frame spectrum, can be obtained, and the energy distribution formula of s (k) is as follows:
Figure BDA0002888673130000052
where p (k) is the hamming speech signal frame spectrum after the fast fourier transform. After the Hamming voice signal frame frequency spectrum is obtained, triangular band-pass filtering can be carried out on the Hamming voice signal frame frequency spectrum to obtain a triangular voice signal frame frequency spectrum, the frequency spectrum can be smoothed through the triangular band filtering, meanwhile, frequency spectrum harmonic waves are eliminated, a voice formant is highlighted, and the follow-up operation amount is reduced. And carrying out discrete cosine transform on the frame frequency spectrum of the triangular voice signal to obtain an L-order Mel frequency cepstrum coefficient. The formula for linear frequency conversion to mel-frequency cepstral coefficients is as follows:
Figure BDA0002888673130000061
after the mel-frequency cepstrum coefficient is obtained, because the standard mel-frequency cepstrum coefficient only reflects the static characteristics of the voice signal to be recognized, the dynamic characteristics of the voice signal to be recognized can be described by the difference spectrum of the static characteristics. The static feature and the dynamic feature of the voice signal to be recognized are the voice voiceprint feature of the voice signal to be recognized.
After extracting the voice voiceprint features of the voice signal to be recognized, voiceprint matching is required to be carried out on the voice voiceprint features and the first voiceprint features of the authorized users or the second voiceprint features of the unauthorized users. Different from the conventional voiceprint recognition and the matching of all users in the database, the voiceprint recognition method and the database have the advantages that the voiceprint data of the user with or without the authority under the current instruction are obtained from the database corresponding to different conditions according to whether the authorized user of the controlled home contains strangers, and the condition that a large amount of time is spent on matching when the data stored in the database is too much is avoided. And meanwhile, the method is only matched with the authority users under the corresponding commands, so that the time consumption is further reduced. The pearson correlation coefficient may be used for voiceprint matching, and the feature with the largest pearson correlation coefficient may be selected as the feature to be matched. If the correlation coefficient is smaller than the set threshold value, judging that the voiceprint is not in the matching range; and if the correlation coefficient is larger than the set threshold value, taking the user with the maximum Pearson correlation coefficient as the matched user of the voiceprint. The Pearson correlation coefficient calculation formula is as follows:
Figure BDA0002888673130000062
wherein x is a voice voiceprint feature, y is a first voiceprint feature or a second voiceprint feature,
Figure BDA0002888673130000063
is the average of the voice print characteristics,
Figure BDA0002888673130000064
is the average of the first voiceprint feature or the average of the second voiceprint feature.
In actual use, it is obviously not reasonable to process all speech signals in real time. Therefore, in order to save hardware and software resources, it is necessary to perform activity detection on the voice signals to ensure that the input detected voice signals are all useful information. And performing voice activity detection by combining short-time energy and short-time average zero crossing rate with a double-threshold method. The short-time energy is simple to calculate, the contrast ratio to sound is high, the short-time average zero-crossing rate is good in performance at a position without noise, the advantages of the short-time average zero-crossing rate and the noise are combined, the constraint characteristic of double thresholds is combined, the error rate can be reduced, and the consumption of calculation resources can be reduced. The speech signal is first framed, taking a frame of 20ms, where the input signal sample rate is 8000 HZ. Each frame is 160samples long. Then, the sum of squares of the signals in the frame is calculated to obtain the short-time energy. All samples in the signal frame are translated by 1, products are made for corresponding points, zero crossing is indicated when the sign is negative, and the short-time average zero crossing rate of the frame is obtained only by solving the product number of all negative numbers in the frame. The energy threshold and the zero crossing rate threshold can be obtained through neural network training. Setting two thresholds for the short-term energy and the short-term average zero-crossing rate respectively, calculating the short-term energy and the short-term average zero-crossing rate of each frame, and judging that the voice starts when the short-term energy and the short-term average zero-crossing rate both exceed the thresholds.
Off-line wake-up is an important entrance of voice interaction, and the main difficulty lies in balancing the contradiction between fast and accurate recognition and low power consumption requirements, and in addition, the real-time performance of off-line wake-up requires that a wake-up system is required to be in operation all the time. The intelligent home control method uses the open-source awakening word detection engine Snowboy, and the Snowboy supports the self-defined awakening word, so that model training can be performed on the training word set by the user in the open state, and the effect of quick and accurate awakening is achieved. The Snowboy has the advantages of no need of connecting with a network, less resource consumption, high degree of customization and the like. Recording three records of the awakening words on the terminal is needed for constructing the awakening words; uploading the recording to a Snowboy official website for training, and adjusting the sensitivity according to the requirement; and after the training is finished, downloading the trained awakening word model to the terminal. For example: the keyword 'Xiaobaibai' is required to be constructed on a raspberry pi terminal, and firstly, after a related environment of the Snowboy is installed on the raspberry pi, the Snowboy is compiled and installed by using a make command. Then, the keyword 'Xiaobai' is recorded on the terminal for three times, the recording needs to be in a relatively quiet environment, and the recorded file is saved as a wav file. And finally logging in a Snowboy official network, uploading the three sound recording files, waiting for the completion of training, testing on the terminal, adjusting the sensitivity according to the actual condition, downloading the trained pmdl model file into the terminal, and replacing the original model. The wake word has been modified to "small white and small white" by now.
The embodiment of the present application further provides an intelligent home control system, including:
the voice acquisition module is used for acquiring a voice signal to be recognized;
the voiceprint recognition module is used for carrying out voiceprint recognition on the voice signal to be recognized and extracting the voice voiceprint characteristics of the voice signal to be recognized;
the voice recognition module is used for carrying out voice recognition on the voice signal to be recognized and extracting a keyword of the voice signal to be recognized;
the authority acquisition module is used for acquiring a control instruction corresponding to the keyword and acquiring an authorized user list corresponding to the control instruction;
the authority matching module is used for determining that the authorized user list does not include strangers and acquiring a first voiceprint characteristic of the authorized user corresponding to the control instruction; performing feature matching on the first voiceprint feature and the voice voiceprint feature to obtain a first correlation coefficient; determining that the first correlation coefficient is larger than a matching threshold value, and generating an intelligent home control signal according to the keyword; determining that the authorized user list comprises strangers, and acquiring a second voiceprint feature of an unauthorized user corresponding to the control instruction; performing feature matching on the second voiceprint feature and the voice voiceprint feature to obtain a second correlation number; and determining that the second correlation number is smaller than a matching threshold value, and generating an intelligent household control signal according to the keyword.
The intelligent home control system can comprise a voice activity detection module, an off-line awakening module, a voice recognition module, a voiceprint recognition module authority matching module and an instruction execution module. And after the user speaks the awakening word, activating the terminal through the offline awakening module. And then, the terminal records a voice signal to be recognized sent by the user, and the voice signal to be recognized is delivered to the voice recognition module and the voiceprint recognition module for processing. The voice recognition module converts the voice into characters and then extracts keywords; and uploading the voice signal to be recognized of the user to a server by the voiceprint recognition module, carrying out voiceprint recognition by the server, extracting voiceprint characteristics, comparing the voiceprint characteristics with the voiceprint characteristics in the database, and returning a comparison result. And the terminal judges whether the user has the authority to control the home or not according to the returned result of the voiceprint recognition, if so, executes an instruction for controlling the corresponding home, and if not, does not execute the instruction.
The system comprises a user interaction interface, a raspberry sending terminal, a server, a voiceprint recognition algorithm, Baidu intelligent cloud real-time voice recognition and various household devices.
The intelligent home control system adopts the raspberry pie as a core control module, and the raspberry pie has the advantages of being adaptive to various deep learning frames, complete in function, numerous in module variety, low in power consumption, moderate in operation speed and the like. The raspberry group is used for interacting with the mobile phone APP, receiving audio input data of a microphone, outputting audio data to a loudspeaker, and controlling the ESP 8266. In addition, the raspberry group is also used as a software platform to complete the processing of voice signals so as to control home.
The smart home control system adopts a reseaker microphone module as an audio input mode, and the reseaker microphone is connected with pins 1-40 of a raspberry group. The recorded audio file will be saved in the raspberry pi-locality in preparation for subsequent speech recognition and voiceprint recognition. Compared with a common USB microphone, the Respeaker microphone module has the characteristics of good noise reduction effect, long sound receiving distance, good sound pickup effect and the like.
The intelligent home control system adopts the mobile phone APP as a medium of a graphical user interface, the mobile phone APP is developed by using JAVA, communication is carried out between the mobile phone APP and the raspberry pie through a Socket protocol, and a user can carry out operations such as user management, home management and permission setting through the mobile phone APP. The user can realize operations such as user management, home management and permission setting through the mobile phone APP. A user opens an APP pre-installed on a mobile phone, enters a login page, and inputs a user name and a password; if the password is correct, entering a menu page; if the password is wrong, prompting that the password is wrong. The menu page has three options of user management, home management and permission setting, and the page corresponding to the option can be accessed by clicking the option. The user can carry out operations of creating a user and deleting the user on the user management page. When a user is newly created, the user inputs name information and then clicks "record". Before recording, the mobile phone end sends a signal to the intelligent home terminal, and the intelligent home terminal starts a microphone after receiving the signal; meanwhile, the mobile phone end prompts the user to record voice information by using the intelligent home terminal. And after the recording is finished, the intelligent home terminal uploads the voice information and the user information to the server, and the server extracts the voiceprint features of the voice information and stores the extracted feature data and the user name in the database. For the user with the input information, the user name is displayed in a list form, the user clicks the user name, the page jumps out to determine whether to delete the user, and if yes, the user can be deleted; and clicking 'No', not reacting. After the setting is completed, the mobile phone end sends information to the intelligent home terminal, the intelligent home terminal analyzes the information and sends the information to the server, and the server deletes the voiceprint feature data of the user. The user can add and delete homes on the home management page. After the setting is finished, the mobile phone end sends information to the intelligent home terminal, and the home information of the mobile phone end and the intelligent home terminal is updated; the user can set the recorded household use authority for the recorded user on the 'set authority' page. After the setting is completed, the mobile phone end sends information to the intelligent home terminal, the intelligent home terminal analyzes the information and sends the information to the server, and the server updates the user name and the authority for controlling different homes to the database.
The intelligent home control system adopts a loudspeaker as an audio amplification output mode. The loudspeaker has two wiring ports, and the audio output interface of 3.5mm that raspberry group was connected to one end obtains the audio signal of raspberry group output, and the USB port that the other end was connected the raspberry group supplies power to the loudspeaker.
The intelligent household control system can adopt an ESP-12F WIFI module to communicate with household equipment. The module core processor ESP8266 integrates an industry-leading Tensiica L106 ultra-low power consumption 32-bit micro MCU in a small-size package, has a 16-bit simplification mode, supports a standard IEEE802.11 b/g/n protocol and a complete TCP/IP protocol stack, and can be used for adding a networking function to existing equipment. When the ESP8266 module is powered on and is connected to a preset local area network, the current IP information is fed back through the serial port. The raspberry pi can control the state of the related pins by accessing the IP and adding a pre-programmed command, so that the on-off state of the smart home is controlled.
The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.
The embodiment of the present application further provides an intelligent home control system, including:
at least one processor;
at least one memory for storing at least one program;
when the at least one program is executed by the at least one processor, the at least one processor is enabled to realize the smart home control method.
The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.
In addition, a storage medium is further provided, where processor-executable instructions are stored, and when executed by a processor, the processor-executable instructions are configured to perform the steps of the method for processing mutual information according to any one of the above-mentioned method embodiments. For the storage medium, it may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. It can be seen that the contents in the foregoing method embodiments are all applicable to this storage medium embodiment, the functions specifically implemented by this storage medium embodiment are the same as those in the foregoing method embodiments, and the advantageous effects achieved by this storage medium embodiment are also the same as those achieved by the foregoing method embodiments.
According to the application, the mobile phone APP is used as a graphical user interaction interface, and a user can perform operations such as user management, home management and permission setting through the mobile phone APP. The user can set the use permission of different homes for family members or strangers according to actual requirements. Different speakers are distinguished by using a voiceprint recognition technology, and the purpose of judging the identities of the speakers is achieved. And extracting the command of controlling the home by the speaker by using a voice recognition technology. And (3) synthesizing the command of controlling the home by the speaker, the speaker identity judged by voiceprint recognition and the home control authority set by the user, and judging whether the speaker has the home control authority for controlling the command of the speaker. According to different authorities, the intelligent home terminal can make different reactions.
It should be appreciated that the layers, modules, units, platforms, and/or the like included in an embodiment system of the application may be implemented or embodied by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
Moreover, the data processing flows performed by the layers, modules, units, and/or platforms included in the system embodiments of the present application may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The data processing flows correspondingly performed by the layers, modules, units and/or platforms included in the system of embodiments of the present application may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or a combination thereof. The computer program includes a plurality of instructions executable by one or more processors.
Further, the system may be implemented in any type of computing platform operatively connected to a suitable connection, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. The data processing flows correspondingly executed by the layers, modules, units and/or platforms included in the system of the present application may be implemented in machine readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, an optical read and/or write storage medium, a RAM, a ROM, etc., so that it may be read by a programmable computer, and when the storage medium or device is read by a computer, may be used to configure and operate the computer to perform the processes described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The present application also includes the computer itself when programmed according to the methods and techniques described herein.
The above description is only a preferred embodiment of the present application, and the present application is not limited to the above embodiment, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present application should be included in the protection scope of the present application as long as the technical effects of the present application are achieved by the same means. Various modifications and variations of the technical solution and/or embodiments thereof are possible within the protective scope of the present application.

Claims (10)

1. The intelligent household control method is characterized by comprising the following steps:
collecting a voice signal to be recognized;
carrying out voice recognition on the voice signal to be recognized, and extracting a keyword of the voice signal to be recognized;
acquiring a control instruction corresponding to the keyword, and acquiring an authorized user list corresponding to the control instruction;
carrying out voiceprint recognition on the voice signal to be recognized, and extracting voice voiceprint characteristics of the voice signal to be recognized;
determining that the authorized user list does not include strangers, and acquiring a first voiceprint feature of the authorized user corresponding to the control instruction; performing feature matching on the first voiceprint feature and the voice voiceprint feature to obtain a first correlation coefficient; determining that the first correlation coefficient is larger than a matching threshold value, and generating an intelligent home control signal according to the keyword;
determining that the authorized user list comprises strangers, and acquiring a second voiceprint feature of an unauthorized user corresponding to the control instruction; performing feature matching on the second voiceprint feature and the voice voiceprint feature to obtain a second correlation number; determining that the second correlation number is smaller than a matching threshold value, and generating an intelligent home control signal according to the keyword;
wherein the stranger is a user who does not enter voiceprint features.
2. The smart home control method according to claim 1, wherein the step of performing voiceprint recognition on the voice signal to be recognized and extracting the voiceprint feature of the voice signal to be recognized specifically comprises:
carrying out high-pass filtering on the voice signal to be recognized to obtain a high-pass voice signal;
framing the high-pass voice signal to obtain a high-pass voice signal frame;
multiplying the high-pass voice signal frame by a Hamming window to obtain a Hamming voice signal frame;
performing fast Fourier transform on the Hamming voice signal frame to obtain a Hamming voice signal frame frequency spectrum;
triangular band-pass filtering is carried out on the Hamming voice signal frame frequency spectrum to obtain a triangular voice signal frame frequency spectrum;
performing discrete cosine transform on the frame frequency spectrum of the triangular voice signal to obtain a Mel frequency cepstrum coefficient;
and extracting the voice voiceprint characteristics of the voice signal to be recognized according to the Mel frequency cepstrum coefficient.
3. The smart home control method according to claim 1, wherein the step of collecting the voice signal to be recognized specifically includes:
collecting a wake-up voice signal;
and determining that the awakening voice signal comprises an awakening word, and collecting the voice signal to be recognized.
4. The smart home control method according to claim 1, wherein the step of collecting the voice signal to be recognized specifically includes:
collecting a voice signal to be identified;
performing voice framing on the voice signal to be recognized to obtain a plurality of voice signal frames to be recognized;
calculating the sum of squares of the voice signal frame to be identified to obtain the short-time energy of the voice signal frame to be identified;
and determining that the short-time energy is larger than an energy threshold value, and acquiring a voice signal to be recognized.
5. The smart home control method according to claim 1, wherein the step of collecting the voice signal to be recognized specifically includes:
collecting a voice signal to be identified;
performing voice framing on the voice signal to be recognized to obtain a plurality of voice signal frames to be recognized;
calculating the short-time average zero crossing rate of the voice signal frame to be identified;
and determining that the short-time average zero crossing rate is greater than a zero crossing rate threshold value, and collecting the voice signal to be recognized.
6. The smart home control method according to claim 1, wherein the first correlation coefficient is a pearson correlation coefficient, and the second correlation coefficient is a pearson correlation coefficient.
7. The smart home control method according to claim 1, wherein the step of collecting the voice signal to be recognized specifically includes:
collecting unprocessed voice signals to be recognized;
and denoising the unprocessed voice signal to be recognized to obtain the voice signal to be recognized.
8. The utility model provides an intelligence house control system which characterized in that includes:
the voice acquisition module is used for acquiring a voice signal to be recognized;
the voice recognition module is used for carrying out voice recognition on the voice signal to be recognized and extracting a keyword of the voice signal to be recognized;
the authority acquisition module is used for acquiring a control instruction corresponding to the keyword and acquiring an authorized user list corresponding to the control instruction;
the voiceprint recognition module is used for carrying out voiceprint recognition on the voice signal to be recognized and extracting the voice voiceprint characteristics of the voice signal to be recognized;
the authority matching module is used for determining that the authorized user list does not include strangers and acquiring a first voiceprint characteristic of the authorized user corresponding to the control instruction; performing feature matching on the first voiceprint feature and the voice voiceprint feature to obtain a first correlation coefficient; determining that the first correlation coefficient is larger than a matching threshold value, and generating an intelligent home control signal according to the keyword; determining that the authorized user list comprises strangers, and acquiring a second voiceprint feature of an unauthorized user corresponding to the control instruction; performing feature matching on the second voiceprint feature and the voice voiceprint feature to obtain a second correlation number; determining that the second correlation number is smaller than a matching threshold value, and generating an intelligent home control signal according to the keyword;
wherein the stranger is a user who does not enter voiceprint features.
9. The utility model provides an intelligence house control system which characterized in that includes:
at least one processor;
at least one memory for storing at least one program;
when the at least one program is executed by the at least one processor, the at least one processor is enabled to implement the smart home control method according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing the smart home control method according to any one of claims 1 to 7.
CN202110021302.7A 2021-01-08 2021-01-08 Smart home control method, smart home control system and storage medium Active CN112820291B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110021302.7A CN112820291B (en) 2021-01-08 2021-01-08 Smart home control method, smart home control system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110021302.7A CN112820291B (en) 2021-01-08 2021-01-08 Smart home control method, smart home control system and storage medium

Publications (2)

Publication Number Publication Date
CN112820291A true CN112820291A (en) 2021-05-18
CN112820291B CN112820291B (en) 2024-05-14

Family

ID=75868777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110021302.7A Active CN112820291B (en) 2021-01-08 2021-01-08 Smart home control method, smart home control system and storage medium

Country Status (1)

Country Link
CN (1) CN112820291B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113643701A (en) * 2021-07-27 2021-11-12 广州市威士丹利智能科技有限公司 Method and system for intelligently recognizing voice to control home
CN114694648A (en) * 2022-02-22 2022-07-01 青岛海尔科技有限公司 Device control method and apparatus, storage medium, and electronic apparatus
CN115424622A (en) * 2022-11-04 2022-12-02 之江实验室 Man-machine voice intelligent interaction method and device
CN115903531A (en) * 2022-10-21 2023-04-04 广州视声智能科技有限公司 Intelligent control equipment false touch prevention method and device
CN116074150A (en) * 2023-03-02 2023-05-05 广东浩博特科技股份有限公司 Switch control method and device for intelligent home and intelligent home
CN116597839A (en) * 2023-07-17 2023-08-15 山东唐和智能科技有限公司 Intelligent voice interaction system and method
WO2023168713A1 (en) * 2022-03-11 2023-09-14 华为技术有限公司 Interactive speech signal processing method, related device and system
CN117037771A (en) * 2023-10-08 2023-11-10 深圳市千贝科技有限公司 Zero cold water control method and device based on voice recognition storage

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469785A (en) * 2015-11-25 2016-04-06 南京师范大学 Voice activity detection method in communication-terminal double-microphone denoising system and apparatus thereof
CN108109233A (en) * 2017-12-14 2018-06-01 华南理工大学 Multilevel security protection system based on biological information of human body
CN108335695A (en) * 2017-06-27 2018-07-27 腾讯科技(深圳)有限公司 Sound control method, device, computer equipment and storage medium
CN109412910A (en) * 2018-11-20 2019-03-01 三星电子(中国)研发中心 The method and apparatus for controlling smart home device
CN109671185A (en) * 2017-10-17 2019-04-23 杭州海康威视数字技术股份有限公司 A kind of Door-access control method and device
CN111354357A (en) * 2018-12-24 2020-06-30 中移(杭州)信息技术有限公司 Audio resource playing method and device, electronic equipment and storage medium
US20200227049A1 (en) * 2019-01-11 2020-07-16 Baidu Online Network Technology (Beijing) Co., Ltd. Method, apparatus and device for waking up voice interaction device, and storage medium
CN111599367A (en) * 2020-05-18 2020-08-28 珠海格力电器股份有限公司 Control method, device, equipment and medium for intelligent household equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469785A (en) * 2015-11-25 2016-04-06 南京师范大学 Voice activity detection method in communication-terminal double-microphone denoising system and apparatus thereof
CN108335695A (en) * 2017-06-27 2018-07-27 腾讯科技(深圳)有限公司 Sound control method, device, computer equipment and storage medium
CN109671185A (en) * 2017-10-17 2019-04-23 杭州海康威视数字技术股份有限公司 A kind of Door-access control method and device
CN108109233A (en) * 2017-12-14 2018-06-01 华南理工大学 Multilevel security protection system based on biological information of human body
CN109412910A (en) * 2018-11-20 2019-03-01 三星电子(中国)研发中心 The method and apparatus for controlling smart home device
CN111354357A (en) * 2018-12-24 2020-06-30 中移(杭州)信息技术有限公司 Audio resource playing method and device, electronic equipment and storage medium
US20200227049A1 (en) * 2019-01-11 2020-07-16 Baidu Online Network Technology (Beijing) Co., Ltd. Method, apparatus and device for waking up voice interaction device, and storage medium
CN111599367A (en) * 2020-05-18 2020-08-28 珠海格力电器股份有限公司 Control method, device, equipment and medium for intelligent household equipment

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113643701A (en) * 2021-07-27 2021-11-12 广州市威士丹利智能科技有限公司 Method and system for intelligently recognizing voice to control home
CN113643701B (en) * 2021-07-27 2024-04-12 广州市威士丹利智能科技有限公司 Method and system for intelligently recognizing voice to control home
CN114694648A (en) * 2022-02-22 2022-07-01 青岛海尔科技有限公司 Device control method and apparatus, storage medium, and electronic apparatus
WO2023168713A1 (en) * 2022-03-11 2023-09-14 华为技术有限公司 Interactive speech signal processing method, related device and system
CN115903531A (en) * 2022-10-21 2023-04-04 广州视声智能科技有限公司 Intelligent control equipment false touch prevention method and device
CN115424622A (en) * 2022-11-04 2022-12-02 之江实验室 Man-machine voice intelligent interaction method and device
CN116074150A (en) * 2023-03-02 2023-05-05 广东浩博特科技股份有限公司 Switch control method and device for intelligent home and intelligent home
CN116074150B (en) * 2023-03-02 2023-06-09 广东浩博特科技股份有限公司 Switch control method and device for intelligent home and intelligent home
CN116597839A (en) * 2023-07-17 2023-08-15 山东唐和智能科技有限公司 Intelligent voice interaction system and method
CN116597839B (en) * 2023-07-17 2023-09-19 山东唐和智能科技有限公司 Intelligent voice interaction system and method
CN117037771A (en) * 2023-10-08 2023-11-10 深圳市千贝科技有限公司 Zero cold water control method and device based on voice recognition storage
CN117037771B (en) * 2023-10-08 2023-12-22 深圳市千贝科技有限公司 Zero cold water control method and device based on voice recognition storage

Also Published As

Publication number Publication date
CN112820291B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
CN112820291B (en) Smart home control method, smart home control system and storage medium
US20220317641A1 (en) Device control method, conflict processing method, corresponding apparatus and electronic device
CN103280220B (en) A kind of real-time recognition method for baby cry
CN110503971A (en) Time-frequency mask neural network based estimation and Wave beam forming for speech processes
CN109920419B (en) Voice control method and device, electronic equipment and computer readable medium
WO2014114049A1 (en) Voice recognition method and device
US20200219384A1 (en) Methods and systems for ambient system control
CN106847281A (en) Intelligent household voice control system and method based on voice fuzzy identification technology
CN112562691A (en) Voiceprint recognition method and device, computer equipment and storage medium
CN109377995B (en) Method and device for controlling equipment
EP3462447A2 (en) Apparatus and method for residential speaker recognition
CN110956965A (en) Personalized intelligent home safety control system and method based on voiceprint recognition
CN113330511B (en) Voice recognition method, voice recognition device, storage medium and electronic equipment
CN113129867B (en) Training method of voice recognition model, voice recognition method, device and equipment
CN110473552A (en) Speech recognition authentication method and system
CN112750459A (en) Audio scene recognition method, device, equipment and computer readable storage medium
CN108322770A (en) Video frequency program recognition methods, relevant apparatus, equipment and system
CN113889091A (en) Voice recognition method and device, computer readable storage medium and electronic equipment
CN109634554A (en) Method and apparatus for output information
CN114333817A (en) Remote controller and remote controller voice recognition method
Gentile et al. Privacy-Oriented Architecture for Building Automatic Voice Interaction Systems in Smart Environments in Disaster Recovery Scenarios
CN115373280A (en) Remote voice control method, device and system
Mayer IoT architecture for home automation by speech control aimed to assist people with mobility restrictions
CN109273003A (en) Sound control method and system for automobile data recorder
CN111833897B (en) Voice enhancement method for interactive education

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant