CN112820278A - Household doorbell automatic monitoring method, equipment and medium based on intelligent earphone - Google Patents

Household doorbell automatic monitoring method, equipment and medium based on intelligent earphone Download PDF

Info

Publication number
CN112820278A
CN112820278A CN202110093910.9A CN202110093910A CN112820278A CN 112820278 A CN112820278 A CN 112820278A CN 202110093910 A CN202110093910 A CN 202110093910A CN 112820278 A CN112820278 A CN 112820278A
Authority
CN
China
Prior art keywords
sound
timbre
door
face
housekeeping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110093910.9A
Other languages
Chinese (zh)
Inventor
邓利军
肖世飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Meita Industrial Investment Co ltd
Original Assignee
Guangdong Meita Industrial Investment Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Meita Industrial Investment Co ltd filed Critical Guangdong Meita Industrial Investment Co ltd
Priority to CN202110093910.9A priority Critical patent/CN112820278A/en
Publication of CN112820278A publication Critical patent/CN112820278A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C9/00Individual registration on entry or exit
    • G07C9/00174Electronically operated locks; Circuits therefor; Nonmechanical keys therefor, e.g. passive or active electrical keys or other data carriers without mechanical keys
    • G07C9/00563Electronically operated locks; Circuits therefor; Nonmechanical keys therefor, e.g. passive or active electrical keys or other data carriers without mechanical keys using personal physical data of the operator, e.g. finger prints, retinal images, voicepatterns
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B3/00Audible signalling systems; Audible personal calling systems
    • G08B3/10Audible signalling systems; Audible personal calling systems using electric transmission; using electromagnetic transmission
    • G08B3/1008Personal calling arrangements or devices, i.e. paging systems
    • G08B3/1016Personal calling arrangements or devices, i.e. paging systems using wireless transmission
    • G08B3/1025Paging receivers with audible signalling details
    • G08B3/1033Paging receivers with audible signalling details with voice message alert
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Electromagnetism (AREA)
  • Alarm Systems (AREA)

Abstract

The invention discloses a household doorbell automatic monitoring method, equipment and medium based on an intelligent earphone, wherein the method comprises the following steps: the method comprises the steps of automatically collecting the ambient sound of a home, automatically identifying whether the ambient sound is the door sound, and automatically outputting the prompt voice of knocking the door if the ambient sound is the door sound.

Description

Household doorbell automatic monitoring method, equipment and medium based on intelligent earphone
Technical Field
The invention relates to the field of earphones, in particular to an intelligent earphone-based household doorbell automatic monitoring method and device and a readable storage medium.
Background
The Bluetooth earphone applies the Bluetooth technology to the hands-free earphone, so that users can avoid annoying wiring stumbling and can easily talk in various ways. Since the advent of bluetooth headsets, it has been a good tool for the mobile commerce industry to increase efficiency.
In the traditional method, when a user wearing the earphone alone goes to the kitchen at home, the user likes listening to music for cooking or seasoning, and at the moment, if other users knock the door or press the doorbell, the user wearing the earphone is difficult to hear the sound of knocking the door or the sound of the doorbell due to the mixing of the music and the sound of cooking and cooking, so that the convenience of opening the door lock is low.
Therefore, finding a method for improving the convenience of opening the door lock is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The embodiment of the invention provides a method, computer equipment and a readable storage medium, which aim to solve the problem of low convenience in unlocking a door lock.
A household doorbell automatic monitoring method based on an intelligent earphone comprises the following steps:
collecting the ambient sound at home by adopting the microphone of the intelligent earphone;
identifying whether the ambient sound is a house door sound;
if the environment sound is the door sound, a loudspeaker of the intelligent earphone is adopted to output a prompt voice for knocking the door by someone.
A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.
A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
According to the household door ring automatic monitoring method based on the intelligent earphone, the computer equipment and the readable storage medium, the environment sound of a house is automatically collected at first, whether the environment sound is the door ringing sound or not is automatically identified, finally, if the environment sound is the door ringing sound, the prompt voice of someone knocking the door is automatically output, and when the user hears the prompt voice of someone knocking the door, the door is opened in time, so that the convenience of opening the door lock is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
Fig. 1 is a schematic diagram of an application environment of a method for automatically monitoring a home doorbell based on an intelligent headset according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for automatically monitoring a home doorbell based on smart headsets in one embodiment of the present invention;
FIG. 3 is a flowchart of step S20 of the method according to an embodiment of the present invention;
FIG. 4 is a flowchart of step S201 of the method according to an embodiment of the present invention;
FIG. 5 is a flowchart of step S2011 of the method according to an embodiment of the present invention;
FIG. 6 is a flowchart of the time-frequency domain conversion of the ambient sound in the method according to an embodiment of the present invention;
FIG. 7 is a flowchart of a method for obtaining and identifying whether a face image includes an acquaintance face picture according to an embodiment of the invention;
FIG. 8 is a flowchart of step S502 of the method according to an embodiment of the present invention;
FIG. 9 is a flow chart of the method for training a face recognition model according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of a computer device according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method provided by the application can be applied to an application environment as shown in fig. 1, where the application environment includes a server and a client, and the client communicates with the server through a wired network or a wireless network. Among other things, the client may be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server can be implemented by an independent server or a server cluster composed of a plurality of servers. The client is used for collecting environmental sound and face images, sending the environmental sound and the face images to the server and outputting prompt voice, and the server is used for identifying whether the environmental sound is a home sound or not, identifying whether the face images contain acquaintance face pictures or not and performing domain conversion on the environmental sound.
In an embodiment, as shown in fig. 2, an automatic monitoring method for a home doorbell based on an intelligent headset is provided, which is described by taking the application of the automatic monitoring method for a home doorbell based on an intelligent headset to the server in fig. 1 as an example, and includes the following steps:
and S10, collecting the ambient sound at home by adopting a microphone of the intelligent earphone.
Specifically, in order to identify whether a door sound exists in an existing environment, a microphone of the smart headset is required to collect an ambient sound at home, and it can be understood that the microphone of the smart headset can collect a sound in real time or within a preset time period.
The door sound is a doorbell sound sent after a doorbell is pressed, or a door knocking sound of a user.
It should be noted that the preset time period may be 0.5 second or 1 second, and specific contents of the preset time period may be set according to practical applications, which is not limited herein.
And S20, identifying whether the environmental sound is the door sound.
And S30, if the environmental sound is the door sound, outputting a prompt voice of knocking the door by a person by adopting a loudspeaker of the intelligent earphone.
Specifically, when the server acquires the environmental sound collected in step S10, it needs to identify the environmental sound to determine whether the environmental sound is a door sound, that is, identify whether the environmental sound is a door sound through a pre-trained door sound identification model.
When this environmental sound is the house door sound, promptly, when this environmental sound was including the house door sound, the suggestion instruction that someone knocked the house door was generated, and the speaker output that adopts intelligent earphone has someone to knock the suggestion pronunciation of house door.
For example, when the environmental sound includes "ding dong" or "dong" or "vaccinium uliginosum", the speaker of the smart headset outputs "a lovely owner, you are well, and someone is knocking your home to catch up with the door opening bar".
The specific contents of the ambient sound are set according to the actual application.
When the ambient sound is not the door sound, that is, when the ambient sound does not include the door sound, the execution returns to step S10.
In the embodiment corresponding to fig. 2, through the above steps S10 to S30, the ambient sound of the home is automatically collected, then whether the ambient sound is the door sound is automatically and intelligently identified, and finally, if the ambient sound is the door sound, the prompt voice of someone knocking the door is automatically output, and when the user hears the prompt voice of someone knocking the door, the door is opened in time, so that the convenience of opening the door lock is improved, that is, the intelligence of the smart headset is improved.
In one embodiment, as shown in fig. 3, the step S20 (i.e. identifying whether the environmental sound is a door sound) specifically includes the following steps:
s201, inputting the environmental sound into a pre-trained door sound recognition model to perform door sound recognition processing, and obtaining a sound recognition result.
S202, if the sound recognition result is the type of the door sound, determining that the environmental sound is the door sound.
Specifically, in order to identify whether the environmental sound is a door sound, the server needs to perform a door sound identification process on the environmental sound collected in step S10 in a pre-trained door sound identification model to obtain a sound identification result, that is, compare the environmental sound with the door sound collected in advance to obtain a sound comparison result, and if the sound comparison result is similar, determine that the sound identification result is the door sound type, and if the sound comparison result is not similar, determine that the sound identification result is not the door sound type.
And if the sound identification result is the type of the door sound, determining that the environmental sound is the door sound, and if the sound identification result is not the type of the door sound, determining that the environmental sound is not the door sound.
The door sound recognition model is obtained by training historical environmental sounds and historical door sound types serving as samples. The door sound type is a type for identifying door sound, for example, the type of iron door sound, the type of wooden door sound or the type of doorbell sound all belong to the door sound type.
In the embodiment corresponding to fig. 3, through the above steps S201 to S202, since the door sound recognition model is trained by using the correct and true historical environmental sound and the historical door sound type as the sample, the obtained sound recognition result is also correct, thereby improving the accuracy of the door sound recognition.
In an embodiment, as shown in fig. 4, the step S201 (i.e., inputting the ambient sound into the pre-trained door sound recognition model for performing the door sound recognition processing to obtain the sound recognition result) specifically includes the following steps:
s2011, inputting the environmental timbre in the environmental sound into a pre-trained housekeeping timbre recognition model for timbre recognition processing to obtain a timbre recognition result;
and S2012, if the tone color identification result is the housekeeping tone color, determining that the sound identification result is the housekeeping sound type.
In the embodiment, different sound representations always have distinctive characteristics in terms of waveforms, and different object vibrations have different characteristics. Different sounding bodies have different materials and structures, so the tone of the sounding is different. The sound-producing body can be distinguished by recognizing the tone.
Specifically, in order to accurately determine whether the ambient sound is a door sound, the server needs to extract the ambient timbre from the ambient sound acquired in step S10, that is, extract the amplitude and phase of the signal from the ambient sound acquired in step S10, for example, extract the ambient timbre from the ambient sound by using the FFT function in MATLAB.
And then inputting the extracted environmental timbre into a pre-trained housekeeping timbre recognition model for timbre recognition processing to obtain a timbre recognition result, namely, comparing the extracted environmental timbre with a pre-acquired accurate target housekeeping timbre to obtain a timbre comparison result, if the timbre comparison result is the same, determining that the timbre recognition result is the housekeeping timbre type, and if the timbre comparison result is different, determining that the timbre recognition result is not the housekeeping timbre type.
And if the tone color identification result is the housekeeping tone color type, determining that the sound identification result is the housekeeping sound type, and if the tone color identification result is not the housekeeping tone color type, determining that the sound identification result is the housekeeping sound type.
The housekeeping timbre recognition model is obtained by training historical environmental sounds and historical housekeeping sound types serving as samples. The family timbre type is a type for identifying the family timbre, and for example, the type of the timbre of an iron door, the type of the timbre of a wooden door or the type of the timbre of a doorbell belong to the family timbre type.
In the embodiment corresponding to fig. 4, through the above steps S2011 to S2012, since the housekeeping timbre identification model is trained by using the correct and real historical environmental timbre and the historical housekeeping timbre type as the sample, the obtained timbre identification result is also correct, thereby improving the accuracy of the housekeeping timbre identification.
In an embodiment, as shown in fig. 5, the step S2011 (i.e., inputting the environmental timbre in the environmental sound into the pre-trained housekeeping timbre recognition model for timbre recognition processing to obtain a timbre recognition result) specifically includes the following steps:
and S20111, acquiring the pre-acquired target housekeeping timbre.
S20112, calculating the similarity between the environment tone and the target housekeeping tone by adopting a cosine similarity method to obtain a similarity value.
And S20113, if the similarity value is greater than or equal to a preset similarity threshold, determining that the tone identification result is the housekeeping tone.
Specifically, in order to accurately identify whether the environmental tone is the housekeeping tone, the server needs to obtain a storage path of a target housekeeping tone acquired in advance from the tone database, and then extract the target housekeeping tone according to the storage path.
It should be noted that the specific content of the tone color database may be a MySQL database or an oracle database, and may be set according to practical applications, which is not limited herein. The target family timbre is accurate and real data acquired manually.
Next, the server calculates the similarity between the environmental tone and the target housekeeping tone by using a cosine similarity method to obtain a similarity value, that is, the environmental tone and the target housekeeping tone are respectively input into the following cosine similarity value calculation formula to obtain the similarity value.
The cosine similarity value calculation formula is specifically as follows:
Figure BDA0002912736080000081
wherein s is a similarity value, AiIs the ith ambient timbre vector, BiIs the ith ambient timbre vector, and n is the total number of vectors.
If the similarity value is greater than or equal to the preset similarity threshold value, the tone color recognition result is determined to be the housekeeping tone color, namely the representative environment tone color is consistent with the housekeeping tone color, and if the similarity value is less than the preset similarity threshold value, the tone color recognition result is determined not to be the housekeeping tone color, namely the representative environment tone color is inconsistent with the housekeeping tone color.
In the embodiment corresponding to fig. 5, through the steps S20111 to S20113, the similarity between the ambient tone and the accurate and real target housekeeping tone is compared to obtain an accurate similarity value, and when the similarity value is greater than or equal to the similarity threshold, the tone identification result is determined to be the housekeeping tone, so that the accuracy of identifying the housekeeping tone is improved.
In an embodiment, as shown in fig. 6, before step S2011, the method further includes performing time-frequency domain conversion on the environmental sound, and specifically includes the following steps:
s401, performing time-frequency domain conversion processing on the environmental sound by adopting a time-frequency domain conversion method to obtain an environmental frequency set.
S402, a band-pass filter is adopted to perform high-frequency filtering processing on the environment frequency set to obtain filtered environment frequency, and meanwhile, the sound signal only containing the filtered environment frequency is determined as environment timbre.
Specifically, in order to accurately obtain the house timbre, the server needs to perform preprocessing on the environmental sound, that is, firstly, a time-frequency domain conversion method is adopted to perform time-frequency domain conversion processing on the environmental sound, so as to obtain an environmental frequency set convenient to analyze, for example, fourier transform is adopted to perform time-frequency domain conversion processing on the environmental sound, so as to obtain an environmental frequency set convenient to analyze.
Then, a band-pass filter is used to perform high-frequency filtering processing on the environment frequency set to obtain a filtered environment frequency, for example, a triangular or sinusoidal filter is used to perform high-frequency filtering processing on the environment frequency set to obtain a filtered clean environment frequency, and finally, a sound signal only containing the filtered environment frequency is determined as an environment timbre.
In the embodiment corresponding to fig. 6, through the above steps S401 to S402, the ambient sound is decomposed to obtain signals of all frequencies, and the high frequency signals are removed, so as to obtain a clean ambient tone, thereby improving the accuracy of determining the ambient tone.
In a specific embodiment, as shown in fig. 7, before step S30, the method further includes acquiring and recognizing whether the face image includes an acquaintance face picture, and specifically includes the following steps:
s501, acquiring a face image of a target user transmitted from the image acquisition equipment, wherein the target user is a user making a family sound.
S502, identifying whether the face image contains an acquaintance face picture.
S503, if the face image contains the face image of the acquaintance, the step of outputting a prompt voice of the person knocking the house door is executed.
S504, the specific prompting voice of the speaker adopting the intelligent earphone for outputting the door knocking by someone is as follows:
the loudspeaker of the intelligent earphone is adopted to output prompt voice of knocking family by acquaintances.
Specifically, considering that the person who knocks may be a murder or a badger, the face image of the user who makes the home sound needs to be acquired by the acquisition device installed on the home, for example, the face image of the user who makes the home sound needs to be acquired by the camera, and when the face image is acquired, the face image is sent to the intelligent headset through bluetooth or a network.
When the intelligent earphone receives the face image, the face image is sent to the server, when the server receives the face image, whether the face image contains an acquaintance face picture or not is recognized, namely, the face image is matched with the face picture of the acquaintance of the owner wearing the intelligent earphone, a matching result is obtained, if the matching result is yes, the face image is determined to contain the acquaintance face picture, a loudspeaker of the intelligent earphone is adopted to output a prompt voice of the acquaintance knocking the house door, and if the matching result is not, the face image is determined not to contain the acquaintance face picture.
Wherein, the mature person is a person who has familiarity, friendship or relationship with the owner.
In the embodiment corresponding to fig. 7, through the above steps S501 to S504, when the face image includes the face image of an acquaintance, the prompt voice for the acquaintance to knock the door is output, that is, when the user making a door sound is an acquaintance of the owner wearing the smart headset, the prompt voice for the acquaintance to knock the door is output, so that the safety of the smart headset prompt is improved.
In an embodiment, as shown in fig. 8, the step S502 (i.e. identifying whether the face image includes an acquaintance face picture) specifically includes the following steps:
s5021, inputting the face image into a pre-trained face recognition model to perform face recognition processing, and obtaining a yes or no face recognition result.
And S5022, if the face recognition result is yes, determining that the face image contains the face picture of the acquaintance.
Specifically, in order to accurately identify whether the face image includes an acquaintance face picture, the server needs to input the face image into a pre-trained face identification model to perform face identification processing, so as to obtain a yes or no face identification result, if the face identification result is yes, it is determined that the face image includes the acquaintance face picture, and if the face identification result is no, it is determined that the face image does not include the acquaintance face picture.
The family timbre recognition model is obtained by training historical face images and historical face recognition results serving as samples, and the historical face images and the historical face recognition results are accurate and real data acquired manually in advance. The face recognition model is a deep learning model, the deep learning model can be a convolutional neural network model or a stacked self-coding network model, and the specific content of the deep learning model can be set according to practical application, which is not limited here.
In the embodiment corresponding to fig. 8, through the steps S5021 to S5022, since the face recognition model is obtained by training the correct and real historical face image and the historical face recognition result as samples, the obtained face recognition result is also correct, thereby improving the accuracy of face recognition.
In a specific embodiment, as shown in fig. 9, before step S5021, a face recognition model is trained, which specifically includes the following steps:
s601, obtaining a historical face image and a historical face recognition result in a historical database.
And S602, inputting the historical face image serving as a training sample into the deep learning model to obtain a temporary result.
And S603, minimizing the error between the temporary result and the historical face recognition result.
And S604, if the error is within a preset condition range, determining the deep learning model as a trained face recognition model.
Specifically, in order to ensure the accuracy of face recognition model recognition, the server needs to use the historical face image and the historical face recognition result as training samples, that is, the historical face image is input into the deep learning model as a training sample to obtain a temporary result, hidden layer nodes of the deep learning model are continuously adjusted to minimize an error between the temporary result and the historical face recognition result, if the error is within a preset condition range, the deep learning model is determined to be the trained face recognition model, and if the error is not within the preset condition range, the execution step S602 is returned until the error between the temporary result and the historical face recognition result is minimized.
In the embodiment corresponding to fig. 9, through the steps S601 to S604, the accurate reality of the training sample is ensured, and the deep learning model is continuously adjusted to make the temporary result consistent with the desired result, thereby improving the accuracy of training the face recognition model.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile readable storage medium, an internal memory. The non-transitory readable storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile readable storage medium. The database of the computer device is used for storing data related to the method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method.
In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the steps of the method of the above embodiments are implemented, for example, steps S10 to S30 shown in fig. 2.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the method of the above-mentioned method embodiments. It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A household doorbell automatic monitoring method based on an intelligent earphone is characterized by comprising the following steps:
collecting the ambient sound at home by adopting the microphone of the intelligent earphone;
identifying whether the ambient sound is a house door sound;
if the environment sound is the door sound, a loudspeaker of the intelligent earphone is adopted to output a prompt voice for knocking the door by someone.
2. The intelligent headset-based automatic home doorbell monitoring method of claim 1, wherein said identifying whether the ambient sound is a home doorbell comprises:
inputting the environmental sound into a pre-trained door sound recognition model for door sound recognition processing to obtain a door sound recognition result, wherein the door sound recognition model is obtained by training historical environmental sound and historical door sound types as samples;
and if the sound identification result is the home door sound, determining that the environment sound is the home door sound.
3. The method for automatically monitoring the household doorbell based on the intelligent headset as claimed in claim 2, wherein the inputting the environmental sound into a pre-trained door sound recognition model for performing a door sound recognition process, and obtaining a sound recognition result comprises:
inputting the environmental timbre in the environmental sound into a pre-trained housekeeping timbre recognition model for timbre recognition processing to obtain a timbre recognition result, wherein the housekeeping timbre recognition model is obtained by training historical environmental timbre and historical housekeeping timbre types as samples;
and if the timbre identification result is the housekeeping timbre, determining that the sound identification result is the housekeeping sound.
4. The method for automatically monitoring the household doorbell based on the intelligent headset as claimed in claim 3, wherein the inputting the environmental timbre in the environmental sound into a pre-trained housekeeping timbre recognition model for timbre recognition processing to obtain the timbre recognition result comprises:
acquiring a pre-collected target family timbre;
calculating the similarity between the environment timbre and the target housekeeping timbre by adopting a cosine similarity method to obtain a similarity value;
and if the similarity value is greater than or equal to a preset similarity threshold value, determining that the tone identification result is the housekeeping tone.
5. The method as claimed in claim 3, wherein before the step of inputting the ambient timbre of the ambient sound into the pre-trained housekeeping timbre recognition model for timbre recognition processing to obtain the timbre recognition result, the method further comprises:
performing time-frequency domain conversion processing on the environmental sound by adopting a time-frequency domain conversion method to obtain an environmental frequency set;
and performing high-frequency filtering processing on the environment frequency set by adopting a band-pass filter to obtain filtered environment frequency, and determining the sound signal only containing the filtered environment frequency as the environment timbre.
6. The method for automatically monitoring the home doorbell according to any one of claims 1-5, wherein before the alert voice of someone knocking home is output by the speaker of the smart headset, the method for automatically monitoring the home doorbell according to the smart headset further comprises:
acquiring a face image of a target user transmitted from image acquisition equipment, wherein the target user is a user making a family sound;
identifying whether the face image contains an acquaintance face picture;
if the face image contains the face image of the acquaintance, the step of outputting a prompt voice of the person knocking the family is executed;
adopt the loudspeaker output of intelligent earphone someone specifically is to strike the suggestion pronunciation of family's door:
and a loudspeaker of the intelligent earphone is adopted to output prompt voice for the acquaintance to knock the family.
7. The method for automatically monitoring the household doorbell based on the intelligent headset as recited in claim 6, wherein said identifying whether the face image contains an acquaintance face picture comprises:
inputting the face image into a pre-trained face recognition model to perform face recognition processing to obtain a yes or no face recognition result;
and if the face recognition result is yes, determining that the face image contains the face image of the acquaintance.
8. The method as claimed in claim 7, wherein before the face image is input into a pre-trained face recognition model for face recognition processing, and a yes or no face recognition result is obtained, the method further comprises:
acquiring a historical face image and a historical face recognition result in a historical database;
inputting the historical face image serving as a training sample into a deep learning model to obtain a temporary result;
minimizing an error between the temporary result and the historical face recognition result;
and if the error is within a preset condition range, determining the deep learning model as a trained face recognition model.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method for automatic monitoring of home doorbell ring tone based on smart headsets according to any of claims 1 to 8 when executing the computer program.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method for automatically monitoring a smart headset-based home doorbell ring tone according to any one of claims 1 to 8.
CN202110093910.9A 2021-01-23 2021-01-23 Household doorbell automatic monitoring method, equipment and medium based on intelligent earphone Pending CN112820278A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110093910.9A CN112820278A (en) 2021-01-23 2021-01-23 Household doorbell automatic monitoring method, equipment and medium based on intelligent earphone

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110093910.9A CN112820278A (en) 2021-01-23 2021-01-23 Household doorbell automatic monitoring method, equipment and medium based on intelligent earphone

Publications (1)

Publication Number Publication Date
CN112820278A true CN112820278A (en) 2021-05-18

Family

ID=75859037

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110093910.9A Pending CN112820278A (en) 2021-01-23 2021-01-23 Household doorbell automatic monitoring method, equipment and medium based on intelligent earphone

Country Status (1)

Country Link
CN (1) CN112820278A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113555029A (en) * 2021-07-21 2021-10-26 歌尔科技有限公司 Voice equipment control method, system, medium and voice equipment
CN113781681A (en) * 2021-09-15 2021-12-10 广东好太太智能家居有限公司 Intelligent door lock sound control method and system, intelligent door lock and readable medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1897054A (en) * 2005-07-14 2007-01-17 松下电器产业株式会社 Device and method for transmitting alarm according various acoustic signals
CN105263078A (en) * 2015-10-26 2016-01-20 无锡智感星际科技有限公司 Smart headphone system capable of identifying multiple sound sources and providing diversified prompt warning mechanisms and methods
CN205195931U (en) * 2015-12-22 2016-04-27 王晓晖 Headset
TW201832539A (en) * 2017-02-15 2018-09-01 國立臺北科技大學 Treatment method for doorbell communication
CN109166586A (en) * 2018-08-02 2019-01-08 平安科技(深圳)有限公司 A kind of method and terminal identifying speaker
CN109243442A (en) * 2018-09-28 2019-01-18 歌尔科技有限公司 Sound monitoring method, device and wear display equipment
CN109741747A (en) * 2019-02-19 2019-05-10 珠海格力电器股份有限公司 Voice scene recognition method and device, sound control method and equipment, air-conditioning
CN110365835A (en) * 2019-06-04 2019-10-22 深圳传音控股股份有限公司 A kind of response method, mobile terminal and computer storage medium
CN110654324A (en) * 2018-06-29 2020-01-07 上海擎感智能科技有限公司 Method and device for adaptively adjusting volume of vehicle-mounted terminal
CN110890087A (en) * 2018-09-10 2020-03-17 北京嘉楠捷思信息技术有限公司 Voice recognition method and device based on cosine similarity
CN111388290A (en) * 2020-03-26 2020-07-10 江南大学 Blind person walking aid based on deep learning and embedded development

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1897054A (en) * 2005-07-14 2007-01-17 松下电器产业株式会社 Device and method for transmitting alarm according various acoustic signals
CN105263078A (en) * 2015-10-26 2016-01-20 无锡智感星际科技有限公司 Smart headphone system capable of identifying multiple sound sources and providing diversified prompt warning mechanisms and methods
CN205195931U (en) * 2015-12-22 2016-04-27 王晓晖 Headset
TW201832539A (en) * 2017-02-15 2018-09-01 國立臺北科技大學 Treatment method for doorbell communication
CN110654324A (en) * 2018-06-29 2020-01-07 上海擎感智能科技有限公司 Method and device for adaptively adjusting volume of vehicle-mounted terminal
CN109166586A (en) * 2018-08-02 2019-01-08 平安科技(深圳)有限公司 A kind of method and terminal identifying speaker
CN110890087A (en) * 2018-09-10 2020-03-17 北京嘉楠捷思信息技术有限公司 Voice recognition method and device based on cosine similarity
CN109243442A (en) * 2018-09-28 2019-01-18 歌尔科技有限公司 Sound monitoring method, device and wear display equipment
CN109741747A (en) * 2019-02-19 2019-05-10 珠海格力电器股份有限公司 Voice scene recognition method and device, sound control method and equipment, air-conditioning
CN110365835A (en) * 2019-06-04 2019-10-22 深圳传音控股股份有限公司 A kind of response method, mobile terminal and computer storage medium
CN111388290A (en) * 2020-03-26 2020-07-10 江南大学 Blind person walking aid based on deep learning and embedded development

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113555029A (en) * 2021-07-21 2021-10-26 歌尔科技有限公司 Voice equipment control method, system, medium and voice equipment
CN113781681A (en) * 2021-09-15 2021-12-10 广东好太太智能家居有限公司 Intelligent door lock sound control method and system, intelligent door lock and readable medium

Similar Documents

Publication Publication Date Title
CN109451188B (en) Method and device for differential self-help response, computer equipment and storage medium
US9640194B1 (en) Noise suppression for speech processing based on machine-learning mask estimation
CA3032807C (en) Call classification through analysis of dtmf events
US20210020018A1 (en) Systems and methods for identifying an acoustic source based on observed sound
WO2021139327A1 (en) Audio signal processing method, model training method, and related apparatus
CN108962237A (en) Mixing voice recognition methods, device and computer readable storage medium
CN107240405B (en) Sound box and alarm method
CN112820278A (en) Household doorbell automatic monitoring method, equipment and medium based on intelligent earphone
WO2016008311A1 (en) Method and device for detecting audio signal according to frequency domain energy
CN110634472B (en) Speech recognition method, server and computer readable storage medium
WO2016090762A1 (en) Method, terminal and computer storage medium for speech signal processing
CN109256139A (en) A kind of method for distinguishing speek person based on Triplet-Loss
CN110769425B (en) Method and device for judging abnormal call object, computer equipment and storage medium
CN107358958B (en) Intercommunication method, apparatus and system
US11996114B2 (en) End-to-end time-domain multitask learning for ML-based speech enhancement
CN112652309A (en) Dialect voice conversion method, device, equipment and storage medium
US20230186943A1 (en) Voice activity detection method and apparatus, and storage medium
CN112133324A (en) Call state detection method, device, computer system and medium
CN111862991A (en) Method and system for identifying baby crying
CN106971712A (en) A kind of adaptive rapid voiceprint recognition methods and system
CN106340310A (en) Speech detection method and device
CN108958699A (en) Voice pick-up method and Related product
CN113113051A (en) Audio fingerprint extraction method and device, computer equipment and storage medium
CN109379499A (en) A kind of voice call method and device
CN108597537A (en) A kind of audio signal similarity detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210518