CN112102546A - Man-machine interaction control method, talkback calling method and related device - Google Patents

Man-machine interaction control method, talkback calling method and related device Download PDF

Info

Publication number
CN112102546A
CN112102546A CN202010790570.0A CN202010790570A CN112102546A CN 112102546 A CN112102546 A CN 112102546A CN 202010790570 A CN202010790570 A CN 202010790570A CN 112102546 A CN112102546 A CN 112102546A
Authority
CN
China
Prior art keywords
information
voice
human body
voice information
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010790570.0A
Other languages
Chinese (zh)
Inventor
林聚财
殷俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202010790570.0A priority Critical patent/CN112102546A/en
Publication of CN112102546A publication Critical patent/CN112102546A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C9/00Individual registration on entry or exit
    • G07C9/30Individual registration on entry or exit not involving the use of a pass
    • G07C9/32Individual registration on entry or exit not involving the use of a pass in combination with an identity check
    • G07C9/37Individual registration on entry or exit not involving the use of a pass in combination with an identity check using biometric data, e.g. fingerprints, iris scans or voice recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Abstract

The application discloses a man-machine interaction control method, a talkback calling method and a related device. The man-machine interaction control method comprises the steps of judging whether voice information is detected or not; if voice information is detected, whether human body information exists in a set area is detected through a camera device; if the human body information exists in the set area, judging whether the voice information is sent out by a person corresponding to the human body information; if the voice information is sent by the person, the voice information is recognized into a voice command and the command is responded. According to the scheme, the intellectualization of the equipment can be improved, and meanwhile, the calculation amount of the equipment can be reduced and the power consumption of the equipment can be reduced.

Description

Man-machine interaction control method, talkback calling method and related device
Technical Field
The present application relates to the field of audio and video algorithm technology and printed circuit board technology, and in particular, to a human-computer interaction control method, a talkback calling method, and a related device.
Background
In the existing intercom system, taking a building intercom system as an example, a voice collecting device is usually arranged at a gate position, voice information of a person in front of a door can be collected through the voice collecting device, then the voice collecting device is communicated with an intercom device in a preset user home, the person in front of the door can be in conversation with the user, and after the identity of the person in front of the door is confirmed, the door can be opened for releasing.
However, in the prior art, a button area is usually required to be arranged on the voice collecting device, an instruction is input through the button area, and then the voice collecting device is communicated with an intercom device in a preset user home, so that the intellectualization of the building intercom system is not high.
Disclosure of Invention
The application provides a man-machine interaction control method, a talkback calling method and a related device, and aims to solve the problem that control interaction in the prior art is not intelligent and effective enough.
In order to solve the technical problem, the application adopts a technical scheme that: a man-machine interaction control method is provided, and comprises the following steps:
judging whether voice information is detected;
if voice information is detected, whether human body information exists in a set area is detected through a camera device;
if the set area has human body information, judging whether the voice information is sent out by a person corresponding to the human body information;
and if the voice information is sent by the personnel, recognizing the voice information into a voice instruction and responding to the instruction.
In order to solve the above technical problem, another technical solution adopted by the present application is: provided is an intercom calling method, comprising the following steps:
judging whether call information is detected;
if the calling information is detected, detecting whether human body information exists in the access control area through a camera device;
if the human body information exists in the access control area, judging whether the calling information is sent by a person corresponding to the human body information;
if the calling information is sent by the personnel, identifying the calling information and determining the room number of the current call;
and making a call through to the room calling device corresponding to the room number.
In order to solve the above technical problem, another technical solution adopted by the present application is: provides a human-computer interaction control device, which comprises a voice module, a human body detection module, a judgment module and a response module,
the voice detection module is used for judging whether voice information is detected or not;
the human body detection module is used for detecting whether human body information exists in a set area through a camera device when the voice detection module detects voice information;
the judging module is used for judging whether the voice information is sent out by a person corresponding to the human body information when the human body information exists in the set area;
and the response module is used for recognizing the voice information into a voice instruction and responding to the instruction when the voice information is sent by the personnel.
In order to solve the above technical problem, the present application adopts another technical solution: the talkback calling device comprises voice, a human body detection module, a judgment module and a call connection module
The voice detection module is used for judging whether call information is detected;
the human body detection module is used for detecting whether human body information exists in the entrance guard area through a camera device when the voice detection module detects the calling information;
the judging module is used for judging whether the calling information is sent by a person corresponding to the human body information when the human body information exists in the access control area;
and the response module is used for identifying the calling information, determining the room number of the current call and making a call through to the room calling device corresponding to the room number when the calling information is sent by the personnel.
In order to solve the above technical problem, the present application adopts another technical solution: provided is a smart device, the smart terminal including: a control circuit, a processor and a memory coupled to each other, wherein,
the memory is used for storing program instructions for implementing a human-machine interaction control method as described above or a talk-back call method as described above;
the processor is configured to execute the program instructions stored by the memory.
In order to solve the above technical problem, the present application adopts another technical solution: there is provided a memory device storing program data executable to implement the program instructions of the human-machine interaction control method as described above or the talk-around call method as described above.
Different from the prior art, the application provides a man-machine interaction control method, a talkback calling method and a related device. The voice information and the image information are adopted for cooperative detection, so that the voice information is sent by a person, the keyword in the voice information is extracted, the equipment can automatically respond to the voice instruction corresponding to the keyword, the intelligence of the equipment can be improved, the calculation amount of the equipment can be reduced, and the power consumption of the equipment can be reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without inventive efforts, wherein:
fig. 1 is a schematic flowchart illustrating an embodiment of a human-computer interaction control method provided in the present application;
FIG. 2 is a schematic flow chart diagram illustrating another embodiment of a human-computer interaction control method provided by the present application;
FIG. 3 is a detailed flowchart of step S210 shown in FIG. 2;
FIG. 4 is a detailed flowchart of step S220 shown in FIG. 2;
FIG. 5 is a flowchart illustrating an embodiment of the step S230 shown in FIG. 2;
fig. 6 is a schematic flowchart illustrating an embodiment of an intercom call method provided in the present application;
FIG. 7 is a schematic structural diagram of an embodiment of a human-computer interaction control apparatus provided in the present application;
fig. 8 is a schematic structural diagram of an embodiment of an intercom calling device provided in the present application;
fig. 9 is a schematic structural diagram of an embodiment of an intelligent terminal provided in the present application;
fig. 10 is a schematic structural diagram of an embodiment of a memory device provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that if directional indications (such as up, down, left, right, front, and back … …) are referred to in the embodiments of the present application, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are changed accordingly.
In addition, if there is a description of "first", "second", etc. in the embodiments of the present application, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an embodiment of a human-computer interaction control method provided in the present application.
The man-machine interaction control method specifically comprises the following steps:
s110: and judging whether the voice information is detected.
In this step, the voice within the set range can be collected by the voice detection device. So that sample sound information can be acquired. The sample voice information is then detected to confirm whether it is voice information uttered by a person. The purpose of this step is to reduce the interference of ambient sound.
When collecting the sound within the set range, the sound can be collected by adopting a preset sampling frequency and a preset sampling interval.
The sampling frequency is the number of audio data sampled in a unit time, for example, 8K sampling frequency is the number of PCM (Pulse Code Modulation) samples in a unit time, and the common sampling frequencies include 8K, 16K, 32K, 44,1K, and 48K, etc., and the unit is hertz (Hz). The PCM is an audio data point, and the original audio data is a continuous signal, which can be digitally represented by a computer by collecting a plurality of discrete time points in a unit time.
The sampling interval mainly refers to that data sampling is carried out at a certain time interval in order to reduce continuous data sampling. The sampling frequency may be set for sampling when data sampling is performed.
In this embodiment, a certain sampling interval and a lower sampling frequency may be adopted for sound sampling, that is, the sampling interval may be set to a proper time, for example, 5 seconds, 10 seconds, or other time to ensure that frequent detection is reduced while sound is detected in time; a lower sampling frequency may be used for the sampling frequency when the detection of the presence of a sound is started to reduce the overhead of the detection algorithm (of course, the detection accuracy may be affected a little under the same conditions). Therefore, by reducing the sampling detection interval and the sample acquisition frequency, the power consumption of the equipment can be reduced, and the memory requirement of the equipment can be reduced; meanwhile, the recognition calculation of whether the voice information is detected can be relatively simple, and the sample voice information can be conveniently and rapidly recognized and judged, so that whether the sample information comprises the voice information can be rapidly and accurately judged.
S120: if the voice information is detected, whether the human body information exists in the set area is detected through the camera device.
When voice information is detected, whether human body information exists in the set area is further detected through the camera device. Whether the human body information exists in the set area can be detected through at least one of a motion detection method, a human face detection method, an infrared light human body detection method and a three-dimensional detection method.
In this step, whether human body information exists in the set area is detected through the camera device, specifically, the image acquisition is performed on the set area through the camera device, so that sample image information is obtained, and then the sample image information is detected, so that whether the human body exists in the sample image information is determined.
Similarly, in this step, the set region may also be acquired with a lower frequency, wherein the optional interval time for acquiring the sample image information may be set to be the same as the first preset time. Similarly, by adopting the scheme, the power consumption of the equipment can be reduced, and the memory requirement of the equipment can be reduced; meanwhile, the judgment and calculation of whether the human body information exists can be relatively simple. Furthermore, in this step, the sample image information may be acquired at a lower frame rate, which may also reduce power consumption of the device and memory requirements of the device. For example, if the original frame rate of the imaging device is 25fps (frame per second), the frame rate lower than the original frame rate can be detected at 12fps or 6fps to reduce the calculation overhead of the initial detection, and after the relevant data is detected and confirmed, the full frame rate processing calculation can be started according to the requirement of the shortage type.
S130: and if the human body information exists in the set area, judging whether the voice information is sent out by a person corresponding to the human body information.
When the sample image information collection is completed and the human body information in the set area is confirmed (namely, the human body information in the set area is confirmed), whether the language information is sent by the personnel is further judged.
In this step, the voice detection device and the camera device can be matched to determine whether the language information is sent by a person.
Can further carry out sound collection through voice detection device, then camera device can carry out image acquisition in step. Then, the voice information in the acquired sample sound information is compared with the sample image information, so that whether the voice is sent by the person can be judged.
Specifically, in this step, the detected sample image information may be identified, so as to obtain lip language information (or lip operation information) of the person; and matching the voice information in the sample voice information with the lip language information selected by the person and acquired by the sample image information.
If the voice message is matched with the lip language message, the voice message is sent by the person; if the voice message does not match the lip language message, the voice message is not sent by the person.
Before the voice detection device further performs sound collection and the camera device synchronously performs image collection, voice prompt can be performed on personnel in the set area to prompt the personnel to send out voice. So as to facilitate the synchronous acquisition of sound acquisition and image acquisition.
S140: if the voice information is sent by the person, the voice information is recognized into a voice command and the command is responded.
When the voice information is judged to be sent by the person, the voice information can be further identified, so that the voice acquisition instruction in the voice information is acquired, and the voice instruction is responded.
In this step, the keyword of the voice message can be extracted, and then the voice command corresponding to the keyword can be responded.
The response instruction may be a voice instruction that a certain device responds to itself, for example, the response instruction may be a voice instruction that the lighting device responds to a keyword "turn on light" to turn on light; it is also possible that a device interacts with other devices in response to the voice command, for example, in a building intercom system, a device at a building gate may respond to a voice command to "turn on xxx building xxx room" or "turn on xxx room" so that the device at the building gate is connected to an intercom device in the xxx building xxx room or xxx room, so that a person at the building gate may have a voice and/or video conversation with a user in the xxx building xxx room or xxx room.
Therefore, the voice information and the image information are adopted for cooperative detection, so that the voice information sent by a person can be more accurately confirmed, and then the keyword in the voice information is extracted, so that the equipment can automatically respond to the voice instruction corresponding to the keyword, the intelligence of the equipment can be improved, the calculation amount of the equipment can be reduced, and the power consumption of the equipment can be reduced.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a human-computer interaction control method according to another embodiment of the present application. In this step, the human-computer interaction control method specifically includes the following steps:
s210: and judging whether the voice information is detected.
Similarly, in this step, the voice detection device is used to detect the environmental sound.
And similarly, the voice in a certain range can be collected through the voice detection device. So that sample sound information can be acquired. The sample voice information is then detected to confirm that it is voice information that has not been uttered by a person.
The purpose of this step is to reduce the interference of ambient sound. The external sound can be sampled at intervals of a first preset time, wherein the time for sampling the external sound can be a second preset time, the first preset time can be 1 second, 2 seconds, 5 seconds or other time, and the second preset time can be set according to actual conditions.
In this embodiment, the sound sampling may be performed with a lower sampling frequency, that is, the first preset time may be set to a longer time, for example, may be set to 5 minutes, 10 minutes, or other time. Therefore, by reducing the sample acquisition frequency, the power consumption of the equipment can be reduced, and the memory requirement of the equipment can be reduced; meanwhile, the recognition calculation of whether the voice information is detected can be relatively simple, and the sample voice information can be conveniently and rapidly recognized and judged, so that whether the sample information comprises the voice information can be rapidly and accurately judged.
It should be understood that, in this step, referring to fig. 3, the detecting the environmental sound by the voice detecting apparatus further includes the following steps:
and S211, starting the voice detection equipment.
In this step, the voice detection device may include a microphone. The microphone can be in a normally open state, and sound collection can be carried out under the normally open state.
Or in other implementations the microphone may be turned on or off intermittently. The external sound may be collected, for example, every preset time interval.
S212, collecting external sound.
In this step, the microphone collects the external sound, so that the sample sound information as described above can be obtained. Similarly, when collecting the sample sound information, a lower sampling frequency (see the above text for details) may be used for the collection.
Optionally, after step S211 and before step S212 collects the external sound, the speech may be further enhanced by a speech enhancement algorithm, so that the collected sample sound information is clearer and easier to identify.
S220: if the voice information is detected, whether the human body information exists in the set area is detected through the camera device.
After the step S210 is completed to determine whether the voice information is detected, if it is recognized that the sample voice information includes the voice information, it is further detected whether there is human body information in the set area by the camera device.
The detecting step in this step may refer to the step S120 described above, and is not limited herein.
The difference between this step and step S120 is that the step specifically includes the following steps, please refer to fig. 4:
s221: and starting the camera device.
In the step, when the voice information is detected to be included in the sample voice information, the camera shooting device can be controlled to be started through the controller; when the voice information is detected to be included in the sample voice information, the camera device is not controlled to be turned on, and the camera device can be turned off.
Therefore, in this step, the image pickup device is started to operate only after the voice information is detected, so that the power consumption of the apparatus can be reduced.
S222: the set area is detected by the imaging device.
In this step, the camera device may include one or at least two cameras, and the camera device may capture the position member in the set area through the cameras so as to obtain the sample image information in the set position.
The camera device can detect the set area by at least one of motion detection, face detection, human body detection, infrared light human body detection or three-dimensional detection.
S223: it is determined whether there is human body information in the set area.
After the acquisition of the sample image information in the setting position is completed in step S222, the sample image information is further confirmed, so that whether or not a person is in the setting area can be determined.
It should be understood that if the detected sample sound information does not include the voice information, the subsequent operation is not performed, and at this time, the external sound is detected in step S210 to determine whether the voice information is detected.
S230: and if the human body information exists in the set area, judging whether the voice information is sent out by a person corresponding to the human body information.
In this step, various methods can be adopted to judge whether the voice information is sent out by the person corresponding to the human body information.
Method 1
The first method specifically comprises the following steps:
1. and determining a voice information sounding source area through a sound source positioning algorithm according to the voice information.
In this step, the sound source can be positioned by the detected voice information, thereby determining the sound source area from which the voice information is sent.
2. The region information of the person is determined by the human body information detected in the image pickup device.
Then, the detected specific position information of the person can be determined by the camera device. The system can be matched through a plurality of cameras, so that the position information of detected personnel can be more accurate.
3. And judging whether the sound source area is matched with the area information of the personnel.
When the area information of the sound source area and the person is acquired, whether the area information of the sound source area and the person is matched can be confirmed. Here, whether or not the sound source region and the region information of the person match means: whether the sound source area information is consistent with the area information of the personnel or the distance between the sound source area information and the area information of the personnel is within a preset range.
4. And if the voice information is matched with the voice information, determining that the voice information is sent by the person.
When the sound source region matches the region information of the person, it may indicate that the voice information is uttered by the person, and then the process may proceed to step S240; if the sound source region and the region information of the person are not matched, it can be stated that the voice information is not sent by the person, at this time, the person can be prompted to send voice, the voice detection device further collects external sound, the camera device further collects images of the set region where the person is located, then the region information of the sound source region and the person is obtained through the collected voice information and the collected images, and at this time, if the region information of the sound source region and the person is not matched, the step S240 is performed; or not matched, the process returns to step S210.
Method two
The second method specifically comprises the following steps:
if the human body information exists, lip language recognition is carried out on the personnel through the video shot by the camera device, and whether the lip language action of the personnel is matched with the voice information or not is determined.
Please refer to fig. 5. The method for recognizing the lip language comprises the following steps:
s231: the lips of the person are inspected.
In this step, the imaging device is first used to detect a setting area to determine whether there is a person in the setting area, and if so, the imaging device further images the lips of the person.
S232: it is confirmed whether the lips of the person can be detected.
However, the imaging device may not be able to capture the lip of the person due to the distance or angle of the lip of the person, and therefore, the detection and recognition are required to ensure clear image information of the detected lip of the person.
When the image information of the lips of the person cannot be detected, the person can be reminded to move towards the direction close to the camera device in a voice broadcasting mode, the lips of the person are opposite to the camera device, and the obtained image information of the lips is enabled to be fresh and easy to recognize.
If clear image information of the lips of the person can be detected, the process further proceeds to step S233.
S233: lip tests were performed on the person.
When the camera device detects clear image information of the lips of the person, the lip language of the person can be further identified or lip movement of the person can be further identified through the image information, so that the lip language information of the person or the lip movement information of the person can be obtained.
If the lip language information of the person or the lip movement information of the person is clearly and easily recognized, the process proceeds to step S234.
If personnel's lip language information or personnel's lip motion information are not clear easy discernment, then can further remind personnel to further adjust personnel's position relative camera device through voice broadcast's mode, until obtaining clear easy discernment personnel's lip language information or personnel's lip motion information.
S234: and acquiring personnel voice information.
When the clearly recognizable lip language information of the person or the lip movement of the person is detected, the voice information of the person can be passed.
At this time, the type of the voice detection device needs to be checked, wherein the checking of the type of the voice detection device further includes the following steps:
s235: and judging whether the voice detection equipment is a directional microphone or a microphone array.
If the speech detection device is a directional microphone or a microphone array, go to step S236: the overall direction of a sound source can be determined and the specific direction of the sound source can be positioned according to the position of a human face or a human body, and a specific voice beam direction is constructed based on the direction of the sound source. Therefore, the sounds except the voice information in the detected voice information can be filtered or shielded, and the acquired voice information is clear and easy to recognize.
If the speech detection device is not a directional microphone or a microphone array, the procedure goes to step S237: and carrying out normal sound collection. Namely, normal external sound collection is performed by the voice detection device.
S240: if the voice information is sent by the person, the voice information is recognized into a voice command and the command is responded.
This step is the same as the step S140 described above, and is not described herein again.
Based on the same inventive concept, the present application also provides an intercom calling method, please refer to fig. 6. The talkback calling method specifically comprises the following steps:
s310: it is determined whether call information is detected.
In this step, the voice detection device may be adopted to detect external voice, so as to obtain sample voice information, and identify the sample voice information, so as to determine whether the sample voice information includes call information. Wherein the call information may include the room number in the building. Such as "xxx rooms" or "xxx building xxx rooms", etc.
In this embodiment, the method for determining whether to detect the call information may be the same as the method for determining whether to detect the voice information in step S110 or step S210, which is not described herein again.
S320: and if the calling information is detected, detecting whether the human body information exists in the access control area through the camera device.
And after the calling information is detected, further detecting whether the human body information exists in the access control area through the camera device.
Likewise, the specific method in this step may also be the same as the method in step S120 or step S220 described above.
S330: and if the human body information exists in the access control area, judging whether the calling information is sent out by a person corresponding to the human body information.
And if the human body information exists in the access control area, judging whether the calling information is sent out by a person corresponding to the human body information.
Likewise, the specific method of this step is the same as the method in step S130 or step S230 described above.
S340: if the calling information is sent by a person, the calling information is identified, and the room number of the current call is determined.
When the call information is confirmed to be sent by a person, the call information can be identified, so that the room number of the current call is confirmed. For example, the room number may be obtained by extracting a keyword from the call information, and for example, when the call information of the person includes "call xxx room" or "connect xxx", it may be determined that the room number of the current call is xxx.
S350: and making a call through to the room calling device corresponding to the room number.
And after the room number is acquired, the call is connected to the room calling device corresponding to the room number, so that the personnel in the door forbidden area can directly communicate with the user in the room corresponding to the room number in a video and/or voice mode.
Based on the same inventive concept, the application also provides a human-computer interaction control device.
Referring to fig. 7, the human-computer interaction control device 40 includes a voice detection module 410, a human body detection module 420, a determination module 430, and a response module 440. The voice detection module 410, the human body detection module 420, the judgment module 430 and the response module 440 are coupled, so as to implement the human-computer interaction control method as described above.
Specifically, the voice detection module 410 includes the voice detection device as described above, so as to determine whether the voice information is detected.
The voice detection module 410 may collect sounds within a certain range. So that sample sound information can be acquired. The sample voice information is then detected to confirm that it is voice information that has not been uttered by a person.
The voice detection module 410 may sample the sound in the environment at intervals of a first preset time, where the time for sampling the external sound may be a second preset time, where the first preset time may be 1 second, 2 seconds, or 5 seconds or other times, and the second preset time may be set according to an actual situation.
In this embodiment, the voice detection module 410 may perform the sound sampling with a lower sampling frequency, that is, the first preset time may be set to a longer time, for example, may be set to 5 minutes, 10 minutes, or other times. Therefore, by reducing the sample acquisition frequency, the power consumption of the equipment can be reduced, and the memory requirement of the equipment can be reduced; meanwhile, the recognition calculation of whether the voice information is detected can be relatively simple, and the sample voice information can be conveniently and rapidly recognized and judged, so that whether the sample information comprises the voice information can be rapidly and accurately judged.
The human body detection module 420 is configured to detect whether there is human body information in the set area through the image capturing device when the voice detection module 410 detects the voice information.
When the voice detection module 410 detects the voice information, the human body detection module 420 further detects whether there is human body information in the setting area through the camera device. The camera device can detect whether the human body information exists in the set area through at least one of a motion detection method, a human face detection method, an infrared light human body detection method and a three-dimensional detection method.
The method comprises the steps of detecting whether human body information exists in a set area through a camera device, specifically, carrying out image acquisition on the set area through the camera device so as to obtain sample image information, and then detecting the sample image information so as to determine whether the human body information exists in the sample image information.
Similarly, the human body detection module 420 may perform image acquisition on the set area with a lower frequency through the camera device, wherein the optional interval time for acquiring the sample image information may be set to be the same as the first preset time. Similarly, by adopting the scheme, the power consumption of the equipment can be reduced, and the memory requirement of the equipment can be reduced; meanwhile, the judgment and calculation of whether the human body information exists can be relatively simple. Furthermore, the camera device in the scheme can also acquire the sample image information at a lower frame rate, and can also reduce the power consumption of the equipment and reduce the memory requirement of the equipment.
The determining module 430 is configured to determine whether the voice information is sent by a person corresponding to the human body information when there is human body information in the set area.
When the human body detection module 420 finishes the collection of the sample image information and confirms that there is human body information in the set area (i.e. confirms that there is a person in the set area), it can be determined whether the language information is sent by a person through the determination module 430.
Specifically, the voice detection module 410 further performs voice acquisition to obtain sample voice information, and the human body detection module 420 performs image acquisition to obtain sample image information synchronously through the camera device. The judging module 430 can compare the voice information in the acquired sample sound information with the sample image information, and further judge whether the voice is uttered by the person.
The determining module 430 may identify the detected sample image information, so as to obtain lip language information (or lip operation information) of the person; and matching the voice information in the sample voice information with the lip language information selected by the person and acquired by the sample image information.
If the voice message is matched with the lip language message, the voice message is sent by the person; if the voice message does not match the lip language message, the voice message is not sent by the person.
Further, the human-computer interaction control device 40 may further include a voice prompt module, and before the voice detection module 410 performs voice collection and the human body detection module 420 performs image collection synchronously through the camera device, the voice prompt module may further perform voice prompt on the person in the set area to prompt the person to send out voice. So as to facilitate the synchronous acquisition of sound acquisition and image acquisition.
The response module 440 is configured to recognize the voice information as a voice instruction and respond to the voice instruction when the determining module 430 determines that the voice information is sent by a person.
In this step, the response module 440 may be a functional module of the human-computer interaction control device 40, wherein the functional module may implement a certain function according to the voice command. For example, the response module 440 may be a lighting module, i.e., the response module 440 may turn on the light or turn off the light to stop the lighting in response to the voice command.
Based on the same inventive concept, the present application also provides an intercom calling device, please refer to fig. 8.
The intercom call device 50 includes a voice detection module 510, a human body detection module 520, a judgment module 530 and a call connection module 540.
The voice detection module 510 is configured to determine whether call information is detected; wherein the speech detection module 510 may be the same as the speech detection module 410 described above.
The human body detection module 520 is configured to detect whether there is human body information in the access control area through the camera device when the voice detection module 510 detects the call information. The human body detection module 520 may be the same as the human body detection module 420 described above.
The determining module 530 is configured to determine whether the call information is sent by a person corresponding to the human body information when there is human body information in the access control area. The determining module 530 may be the same as the determining module 430 described above.
The call connection module 540 is configured to identify the call information, determine a room number of a current call, and connect a call to a room calling device corresponding to the room number when the call information is sent by a person. The call completing module 540 may be one of the response modules 440 described above.
Based on the same inventive concept, the application also provides an intelligent terminal, please refer to fig. 9.
The intelligent terminal 60 comprises a control circuit 601, a memory 602 and a processor 603, which are coupled to each other, wherein the control circuit 601 is configured to receive a control instruction of a user, the memory 602 is configured to store program data, and the processor 603 executes the program data, so as to implement a human-machine interaction control method or a talkback call method as described above.
Based on the same inventive concept, the present application further provides a memory device, please refer to fig. 10, where fig. 10 is a schematic structural diagram of an embodiment of the memory device provided in the present application. The storage device 70 stores therein program data 71, and the program data 71 may be a program or instructions that can be executed to implement any one of the man-machine interaction control methods or the intercom call method described above.
In one embodiment, the apparatus 70 with storage function may be a memory chip in a terminal, a hard disk, or other readable and writable storage tool such as a mobile hard disk or a flash disk, an optical disk, or the like, and may also be a server, or the like.
In the embodiments provided in the present invention, it should be understood that the disclosed method and apparatus can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a processor or a memory is merely a logical division, and an actual implementation may have another division, for example, a plurality of processors and memories may be combined to implement the functions or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or connection may be an indirect coupling or connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be substantially or partially implemented in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In summary, the present application provides a human-computer interaction control method, an intercom call method and a related device. The voice information and the image information are adopted for cooperative detection, so that the voice information is sent by a person, the keyword in the voice information is extracted, the equipment can automatically respond to the voice instruction corresponding to the keyword, the intelligence of the equipment can be improved, the calculation amount of the equipment can be reduced, and the power consumption of the equipment can be reduced.
The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (11)

1. A human-computer interaction control method is characterized by comprising the following steps:
judging whether voice information is detected;
if voice information is detected, detecting whether human face or human body information exists in a set area through a camera device;
if the set area has human body information, judging whether the voice information is sent out by a person corresponding to the human body information;
and if the voice information is sent by the personnel, recognizing the voice information into a voice instruction and responding to the instruction.
2. The human-computer interaction control method according to claim 1,
if the set area has human body information, the step of judging whether the voice information is sent out by the person corresponding to the human face or the human body information comprises the following steps:
determining the voice information sounding source area through a sound source positioning algorithm according to the voice information;
determining the regional information of the person through the human body information detected in the camera device;
judging whether the sound source area is matched with the area information of the personnel;
and if the voice information is matched with the voice information, determining that the voice information is sent by the personnel.
3. The human-computer interaction control method according to claim 2, wherein the step of recognizing the voice information as a voice command and responding to the command further comprises:
and filtering the sound with the distance or the pointing range between the sound source area and the area information of the personnel out of a preset range so as to prevent the interference of unnecessary information.
4. The human-computer interaction control method according to claim 1,
if voice information is detected, the step of detecting whether human faces or human body information exists in the set area through the camera device comprises the following steps:
if voice information is detected, video shooting is carried out on the set area through the camera device, and whether human face or human body information exists in the set area is detected;
if the set area has face or human body information, the step of judging whether the voice information is sent by a person corresponding to the face or human body information comprises the following steps:
if the face or human body information exists, performing lip detection or lip recognition on the personnel through the video shot by the camera device, and determining whether the lip action of the personnel is consistent with the sound source positioning direction or whether the lip action is matched with the voice information;
and if the voice information is matched with the voice information, determining that the voice information is sent by the personnel.
5. The human-computer interaction control method according to claim 1, wherein the step of detecting whether human body information exists in the set area through the camera if voice information is detected comprises:
and if the voice information is detected, determining whether the human body information exists in the set area by using at least one of a motion detection method, a human face detection method, an infrared light human body detection method and a three-dimensional stereo detection method through the camera device.
6. The human-computer interaction control method according to claim 1, wherein the step of determining whether voice information is detected comprises:
collecting sound information in the set area by adopting a set sampling frequency;
and judging the collected sound information to confirm whether the sound information has the voice information.
7. An intercom calling method, characterized in that the calling method comprises:
judging whether call information is detected;
if the calling information is detected, detecting whether human body information exists in the access control area through a camera device;
if the human body information exists in the access control area, judging whether the calling information is sent by a person corresponding to the human body information;
if the calling information is sent by the personnel, identifying the calling information and determining the room number of the current call;
and making a call through to the room calling device corresponding to the room number.
8. A human-computer interaction control device is characterized by comprising a voice and human body detection module, a judgment module and a response module,
the voice detection module is used for judging whether voice information is detected or not;
the human body detection module is used for detecting whether human body information exists in a set area through a camera device when the voice detection module detects voice information;
the judging module is used for judging whether the voice information is sent out by a person corresponding to the human body information when the human body information exists in the set area;
and the response module is used for recognizing the voice information into a voice instruction and responding to the instruction when the voice information is sent by the personnel.
9. The talkback calling device is characterized by comprising a voice module, a human body detection module, a judgment module and a call connection module
The voice detection module is used for judging whether call information is detected;
the human body detection module is used for detecting whether human body information exists in the entrance guard area through a camera device when the voice detection module detects the calling information;
the judging module is used for judging whether the calling information is sent by a person corresponding to the human body information when the human body information exists in the access control area;
and the response module is used for identifying the calling information, determining the room number of the current call and making a call through to the room calling device corresponding to the room number when the calling information is sent by the personnel.
10. An intelligent device, characterized in that, intelligent terminal includes: a control circuit, a processor and a memory coupled to each other, wherein,
the memory is used for storing program instructions for implementing the human-computer interaction control method according to any one of claims 1-6 or the talkback call method according to claim 7;
the processor is configured to execute the program instructions stored by the memory.
11. A storage device, characterized in that it stores program data executable to implement the program instructions of the human-machine interaction control method according to any one of claims 1 to 6 or the talk-back call method according to claim 7.
CN202010790570.0A 2020-08-07 2020-08-07 Man-machine interaction control method, talkback calling method and related device Pending CN112102546A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010790570.0A CN112102546A (en) 2020-08-07 2020-08-07 Man-machine interaction control method, talkback calling method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010790570.0A CN112102546A (en) 2020-08-07 2020-08-07 Man-machine interaction control method, talkback calling method and related device

Publications (1)

Publication Number Publication Date
CN112102546A true CN112102546A (en) 2020-12-18

Family

ID=73752757

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010790570.0A Pending CN112102546A (en) 2020-08-07 2020-08-07 Man-machine interaction control method, talkback calling method and related device

Country Status (1)

Country Link
CN (1) CN112102546A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113205632A (en) * 2021-04-27 2021-08-03 内蒙古科电数据服务有限公司 Internet of things equipment safety access method suitable for electric power operation site
CN113486760A (en) * 2021-06-30 2021-10-08 上海商汤临港智能科技有限公司 Object speaking detection method and device, electronic equipment and storage medium
CN115620440A (en) * 2022-12-06 2023-01-17 湖南三湘银行股份有限公司 Novel data center safety protection device

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030122652A1 (en) * 1999-07-23 2003-07-03 Himmelstein Richard B. Voice-controlled security with proximity detector
CN104103274A (en) * 2013-04-11 2014-10-15 纬创资通股份有限公司 Speech processing apparatus and speech processing method
CN105159111A (en) * 2015-08-24 2015-12-16 百度在线网络技术(北京)有限公司 Artificial intelligence-based control method and control system for intelligent interaction equipment
US20170110123A1 (en) * 2015-10-16 2017-04-20 Google Inc. Hotword recognition
CN107346661A (en) * 2017-06-01 2017-11-14 李昕 A kind of distant range iris tracking and acquisition method based on microphone array
CN107404381A (en) * 2016-05-19 2017-11-28 阿里巴巴集团控股有限公司 A kind of identity identifying method and device
CN110047184A (en) * 2019-04-24 2019-07-23 厦门快商通信息咨询有限公司 A kind of auth method and access control system of anti-recording attack
CN110211267A (en) * 2019-06-03 2019-09-06 浙江大华技术股份有限公司 Indoor openings control method, the configuration method of permission, device and storage medium
CN110322596A (en) * 2018-03-30 2019-10-11 上海擎感智能科技有限公司 Boot method of controlling switch and system based on position identification and speech recognition
CN110853619A (en) * 2018-08-21 2020-02-28 上海博泰悦臻网络技术服务有限公司 Man-machine interaction method, control device, controlled device and storage medium
CN111341350A (en) * 2020-01-18 2020-06-26 南京奥拓电子科技有限公司 Man-machine interaction control method and system, intelligent robot and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030122652A1 (en) * 1999-07-23 2003-07-03 Himmelstein Richard B. Voice-controlled security with proximity detector
CN104103274A (en) * 2013-04-11 2014-10-15 纬创资通股份有限公司 Speech processing apparatus and speech processing method
CN105159111A (en) * 2015-08-24 2015-12-16 百度在线网络技术(北京)有限公司 Artificial intelligence-based control method and control system for intelligent interaction equipment
US20170110123A1 (en) * 2015-10-16 2017-04-20 Google Inc. Hotword recognition
CN107404381A (en) * 2016-05-19 2017-11-28 阿里巴巴集团控股有限公司 A kind of identity identifying method and device
CN107346661A (en) * 2017-06-01 2017-11-14 李昕 A kind of distant range iris tracking and acquisition method based on microphone array
CN110322596A (en) * 2018-03-30 2019-10-11 上海擎感智能科技有限公司 Boot method of controlling switch and system based on position identification and speech recognition
CN110853619A (en) * 2018-08-21 2020-02-28 上海博泰悦臻网络技术服务有限公司 Man-machine interaction method, control device, controlled device and storage medium
CN110047184A (en) * 2019-04-24 2019-07-23 厦门快商通信息咨询有限公司 A kind of auth method and access control system of anti-recording attack
CN110211267A (en) * 2019-06-03 2019-09-06 浙江大华技术股份有限公司 Indoor openings control method, the configuration method of permission, device and storage medium
CN111341350A (en) * 2020-01-18 2020-06-26 南京奥拓电子科技有限公司 Man-machine interaction control method and system, intelligent robot and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
丁爱萍: "《物联网导论》", 31 March 2017 *
宓超,沈阳,宓为建: "《装卸机器视觉及其应用》", 31 January 2016 *
罗伯特.F.卡尔: "《疗养院与康复中心设计》", 31 August 2014 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113205632A (en) * 2021-04-27 2021-08-03 内蒙古科电数据服务有限公司 Internet of things equipment safety access method suitable for electric power operation site
CN113205632B (en) * 2021-04-27 2022-06-28 内蒙古科电数据服务有限公司 Internet of things equipment security access method suitable for electric power operation field
CN113486760A (en) * 2021-06-30 2021-10-08 上海商汤临港智能科技有限公司 Object speaking detection method and device, electronic equipment and storage medium
CN115620440A (en) * 2022-12-06 2023-01-17 湖南三湘银行股份有限公司 Novel data center safety protection device

Similar Documents

Publication Publication Date Title
CN112102546A (en) Man-machine interaction control method, talkback calling method and related device
CN110291489B (en) Computationally efficient human identification intelligent assistant computer
US11483657B2 (en) Human-machine interaction method and device, computer apparatus, and storage medium
EP3627290A1 (en) Device-facing human-computer interaction method and system
JP4729927B2 (en) Voice detection device, automatic imaging device, and voice detection method
JP6994292B2 (en) Robot wake-up methods, devices and robots
CN108766438B (en) Man-machine interaction method and device, storage medium and intelligent terminal
CN110730115B (en) Voice control method and device, terminal and storage medium
CN109920419B (en) Voice control method and device, electronic equipment and computer readable medium
CN109817211B (en) Electric appliance control method and device, storage medium and electric appliance
CN112151029A (en) Voice awakening and recognition automatic test method, storage medium and test terminal
CN109373518B (en) Air conditioner and voice control device and voice control method thereof
CN110096251B (en) Interaction method and device
CN109448705B (en) Voice segmentation method and device, computer device and readable storage medium
CN111724780B (en) Equipment wake-up method and device, electronic equipment and storage medium
CN108847221B (en) Voice recognition method, voice recognition device, storage medium and electronic equipment
CN110767214A (en) Speech recognition method and device and speech recognition system
CN111583937A (en) Voice control awakening method, storage medium, processor, voice equipment and intelligent household appliance
CN110716444A (en) Sound control method and device based on smart home and storage medium
CN112286364A (en) Man-machine interaction method and device
CN111326152A (en) Voice control method and device
CN111370004A (en) Man-machine interaction method, voice processing method and equipment
CN111090412B (en) Volume adjusting method and device and audio equipment
CN212160784U (en) Identity recognition device and entrance guard equipment
CN109271480B (en) Voice question searching method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201218

RJ01 Rejection of invention patent application after publication