WO2022143300A1

WO2022143300A1 - Visitor talkback control method, talkback control apparatus, system, electronic device, and storage medium

Info

Publication number: WO2022143300A1
Application number: PCT/CN2021/140086
Authority: WO
Inventors: 钟浩华
Original assignee: Oppo广东移动通信有限公司
Priority date: 2020-12-31
Filing date: 2021-12-21
Publication date: 2022-07-07
Also published as: CN114697611A; CN114697611B

Abstract

The embodiments of the present application disclose a visitor talkback control method, a talkback control apparatus, a system, an electronic device, and a storage medium. The talkback control method is applied to an electronic device, and comprises: determining image data acquired in a talkback process or a talkback request process, the image data comprising image data of a talkback request end and/or image data of a talkback receiving end; and when the image data satisfies a first preset condition, triggering the talkback process to end or triggering the talkback request process to end, thereby improving convenience of talkback control.

Description

Visiting intercom control method, intercom control device, system, electronic equipment and storage medium

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority of the Chinese Patent Application No. 202011629375.6 and titled "Visitor Intercom Control Method, Intercom Control Device, System, Electronic Equipment and Storage Medium" filed with the China Patent Office on December 31, 2020, The entire contents of which are incorporated herein by reference.

technical field

The present application relates to the field of communication control, and more particularly, to a visiting intercom control method, an intercom control device, a system, an electronic device and a storage medium.

Background technique

With the intelligentization of household items, more and more smart home products are networked to form a smart home for users to use, such as smart doorbells, which can perform functions such as video shooting, monitoring, and intercom, and can also be linked with other home products. For example, intercom, but the current doorbell products still have some inconvenience in the process of intercom.

SUMMARY OF THE INVENTION

Embodiments of the present application provide a visiting intercom control method, an intercom control device, a system, an electronic device, and a storage medium. Improve the convenience of intercom control.

In a first aspect, an embodiment of the present application provides a visiting intercom control method, which is applied to an electronic device. The method includes: determining image data collected during an intercom process or an intercom request process, and the image data includes an image of an intercom requesting end. data and/or image data of the intercom receiving end; when the image data satisfies the first preset condition, the end of the intercom process is triggered or the process of the intercom request is triggered to end.

In a second aspect, an embodiment of the present application provides a visiting intercom control method, which is applied to an electronic device. The method includes: determining image data collected during the intercom process and intercom voice collected during the intercom process, and the image data includes a pair of The image data of the talk requesting end and/or the image data of the intercom receiving end; the intercom voice includes the voice of the intercom requesting end and/or the voice of the intercom receiving end; when the image data satisfies the first preset condition and the intercom voice satisfies the second preset When the condition is set, the intercom process is triggered to end.

In a third aspect, an embodiment of the present application provides a visiting intercom control device, the intercom control device includes: a determining unit configured to determine image data collected during an intercom process or an intercom request process, and the image data includes: The image data of the intercom requesting end and/or the image data of the intercom receiving end; the triggering unit is configured to trigger the end of the intercom process or the end of the intercom request process when the image data satisfies the first preset condition.

In a fourth aspect, an embodiment of the present application provides a visiting intercom control device, the intercom control device includes: a determining unit configured to determine image data collected during the intercom process and intercom voice collected during the intercom process , the image data includes the image data of the intercom requesting end and/or the image data of the intercom receiving end, and the intercom voice includes the speech of the intercom requesting end and/or the speech of the intercom receiving end; the triggering unit is configured to be used when the image data satisfies the first When a preset condition and the intercom voice satisfies the second preset condition, the end of the intercom process is triggered.

In a fifth aspect, an embodiment of the present application provides a visiting intercom system, the system includes a doorbell device and a TV, the doorbell device is connected to the TV, and the doorbell device or the TV is configured to determine an intercom process or an intercom request The image data collected in the process, the image data includes the image data of the intercom requesting end collected by the doorbell device and/or the image data of the intercom receiving end collected by the TV; when the image data meets the first preset condition, the doorbell device or TV It is configured to trigger the end of the intercom procedure or to trigger the end of the intercom request procedure.

In a sixth aspect, embodiments of the present application provide an electronic device, including: one or more processors; a memory; and one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by a or multiple processors executing, one or more programs configured to perform a method as related to any one of the first aspect or the second aspect.

In a seventh aspect, an embodiment of the present application provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, and the program code can be called by a processor to execute any one related to the first aspect or the second aspect. item method.

Description of drawings

The above and/or additional aspects and advantages of the present application will become apparent and readily understood from the following description of embodiments taken in conjunction with the accompanying drawings, wherein:

1 is a schematic flowchart of a visiting intercom control method provided by an embodiment of the present application;

2 is a schematic flowchart of another visiting intercom control method provided by an embodiment of the present application;

3 is a schematic diagram of an electronic device provided by an embodiment of the present application;

4 is a schematic diagram of a computer-readable storage medium provided by an embodiment of the present application;

5 is a block diagram of functional units of a visiting intercom control device provided by an embodiment of the present application;

6 is a schematic diagram of the architecture of a visiting intercom system provided by an embodiment of the present application;

FIG. 7 is a schematic flowchart of a visiting intercom system provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present application.

In the description of the present application, it should be understood that the terms "first", "second" and the like are used for descriptive purposes only, and should not be construed as indicating or implying relative importance. For those of ordinary skill in the art, the specific meanings of the above terms in this application can be understood in specific situations. Also, in the description of the present application, unless otherwise specified, "a plurality" means two or more. "And/or", which describes the association relationship of the associated objects, means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects are an "or" relationship. The step numbers in the present application are only used for example, may correspond to different embodiments, and the sequence is not limited unless there is conflict.

An embodiment of the present application provides a visiting intercom control method, which is applied to an electronic device. The electronic device includes at least one of an intercom requesting end device, an intercom receiving end device, and a cloud server; wherein the intercom requesting end device may be It is understood as an outdoor device in the visiting intercom, which is mainly used to control and initiate the intercom. It can include: at least one of a doorbell outdoor unit (which may include an image acquisition unit), a camera, and an access control outdoor unit, where the camera can be a cat-eye camera. It can be a surveillance camera, etc., which is not limited. The intercom receiving end device can be understood as the indoor device in the visiting intercom, such as smart home equipment, which is mainly used to control the receiving intercom, which can include: doorbell indoor unit, access control indoor unit, TV, router, gateway device, customer front At least one of the CPE (Customer Premise Equipment), speaker, smart camera, TV box, computer, and mobile phone.

It is understandable that when someone visits, an intercom request is initiated. Generally, the intercom requester device initiates the intercom request, and then the intercom receiver device accepts the intercom request to establish the intercom. If no user receives the intercom request , Generally, the intercom request will end after a period of time. Even if the visitor leaves, if it is still in the intercom request state within the set time, it will cause waste of resources or unnecessary interference to the surrounding environment. The intercom can be a voice intercom or a video intercom. The video intercom can be a single-party video intercom, that is, only one party can display video images, or it can be two-party or multi-party video intercom, that is, multiple parties can display video images. . Because in the intercom, it is generally necessary for the user to manually end the intercom, such as ending the intercom by pressing a button, which is inconvenient for the user to operate, especially when the user is far away from the intercom device. It should be noted that the user may be a visited user or a visiting user (eg, a visitor).

For the convenience of user intercom, please refer to FIG. 1 . A visiting intercom control method provided by an embodiment of the present application includes: step S10. Determine the image data collected during the intercom process or the intercom request process, and the image data includes a pair of Speak the image data of the requester and/or the image data of the intercom receiver;

It should be noted that the intercom process can be understood as the process in which the visiting user initiates the intercom but the interviewed user has not yet accepted the intercom. The speaking requesting end or the intercom receiving end device generates a prompt such as voice or image. This stage can be considered as an intercom request process, which is used to request and wait for the interviewed user of the intercom receiving end to accept the intercom; for example, step S10 includes: S101 . Determine the image data collected in the intercom request process, the image data includes the image data of the intercom requesting end and/or the image data of the intercom receiving end. After sensing the above prompt, the interviewed user at the intercom receiver can accept the intercom through the intercom receiver device, which can be understood as entering the intercom process, that is to say, the intercom process can be understood as the interviewed user accepts After the request is made, it enters the stage where both parties can talk to each other. During the intercom process, both parties can communicate through voice, video and other methods. Exemplarily, step S10 includes: S102. Determine the image data collected during the intercom process, where the image data includes the image data of the intercom requesting end and/or the image data of the intercom receiving end. Wherein, the image data includes the image data of the intercom requesting end and/or the image data of the intercom receiving end, which may only include the image data of the intercom requesting end, or only include the image data of the intercom receiving end, or include the image data of the intercom requesting end. Image data and image data of the intercom receiver.

In addition, the image data of the intercom requesting end generally refers to the image data collected by the image acquisition unit of the intercom requesting end device, while the image data of the intercom receiving end generally refers to the image data collected by the image acquisition unit of the intercom receiving end device. It should be noted that, it may also be image data collected by other devices connected to the intercom receiving end device or the intercom requesting end device, and there may be multiple intercom receiving end devices. Determine the image data collected during the intercom process or the intercom request process. It can be an image acquisition unit (either the requester or the receiver) to collect image data, and the intercom requester device or the intercom receiver device or the cloud. In the case of acquisition by the server, etc., it can be acquired by the CPU or GPU of the intercom requesting end device and the intercom receiving end device. The transmission can be wired or wireless. The acquisition device may be different from the device that executes the above S10. In addition, , the image data determined in S10 may be part or all of the image data collected during the intercom process or the intercom request process.

It can be understood that during the intercom process and/or the intercom request process, the image acquisition unit of the intercom requesting end device can collect the image data of the intercom requesting end in real time, and the image data of the intercom receiving end can be the intercom receiving end device. The image data collected by the image acquisition unit in real time during the intercom process, in which the image data of the intercom requester is generally the image data that can be collected by the image acquisition unit of the intercom requester device, such as the range that can be captured by the camera. The image data of the intercom receiving end is generally the image data that can be collected by the image acquisition unit of the intercom receiving end device, such as the image data within the range that can be captured by a TV camera. The intercom requesting end device, the intercom receiving end device, the cloud server, etc. can obtain the above collected image data.

For example, the device on the intercom request side is the doorbell external unit, which can be understood as a doorbell device installed outside the door. After receiving the intercom request command, the doorbell external unit starts to collect the image data of the intercom requester and sends it to the doorbell internal unit. Send an intercom request command, and the doorbell internal unit will issue a prompt after receiving the intercom request command (such as a voice prompt, or a prompt by playing the image data of the intercom request terminal, or a vibration prompt, etc., which is not limited here) to Remind the interviewed user to answer the intercom, the doorbell internal unit enters the intercom process after receiving the instruction to answer the intercom, so that the visitor and the interviewed user can intercom through the doorbell external unit and the doorbell internal unit, and the doorbell internal unit can receive the doorbell The image data collected by the image acquisition unit (such as a camera) of the external machine is used for image playback. In order to ensure privacy, the image of the intercom receiving end is generally not played to the intercom requesting end, that is, visitors generally cannot see the intercom receiving end. Image of interviewed user. In addition to the doorbell external unit that can obtain image data, the doorbell internal unit (intercom receiver), cloud server, etc. can also receive the image data collected by the doorbell external unit to obtain relevant image data. For example, the image acquisition unit of the doorbell external unit collects images in real time The data is then sent to the doorbell internal unit or cloud server through wired or wireless communication; the doorbell internal unit is generally used indoors, and the related functions of the doorbell internal unit can also be integrated into various terminal devices, such as TV sets , speakers, mobile phones, tablet computers, etc. For example, taking a TV as an example, when an intercom request is received, a picture will pop up on the TV screen (if the TV is in use, a picture-in-picture can also be displayed on the playing TV screen) to display the intercom. The image of the requesting end, after the TV receives the intercom instruction from the interviewed user, it establishes the intercom with the doorbell end.

It should be noted that the image acquisition unit of the doorbell outdoor unit can start to collect image data when the intercom request is initiated, continue to collect after the intercom starts, or even start acquisition earlier. region, start the image acquisition unit. Therefore, step S10 of this embodiment can also be applied to the intercom request process, that is, in the intercom request process, the image data collected in the intercom request process is acquired, and the image data includes the image data of the intercom requesting end; the image data Can be sent to the intercom receiver or not.

S30. When the image data satisfies the first preset condition, trigger the intercom request process to end or trigger the intercom process to end.

It can be understood that the above-mentioned image data may be the image data of the intercom requesting end collected in real time during the intercom process or the intercom request process, or the image data of the intercom receiving end, or the image data of the intercom requesting end and the intercom. image data at the receiving end. That is to say, when the image data of the intercom requesting end meets the first preset condition, the end of the intercom request process is triggered or the end of the intercom process is triggered; it can also be that when the image data of the intercom receiving end meets the first preset condition, Triggering the end of the intercom request process or triggering the end of the intercom process, or triggering the end of the intercom request process or triggering the intercom when both the image data of the intercom requester and the image data of the intercom receiver meet the first preset condition Process ends. That is to say, when the image data includes the image data of the intercom requesting end and/or the image data of the intercom receiving end, if the image data satisfies the first preset condition, triggering the end of the intercom process or triggering the end of the intercom request process, including: When the image data of the intercom requesting end and/or the image data of the intercom receiving end satisfy the first preset condition, the end of the intercom process is triggered or the end of the intercom request process is triggered.

It should be noted that triggering the end of the intercom process can be understood as triggering the end of the intercom, so that both the visiting user and the interviewed user cannot continue to communicate through the intercom device. The end of the process of triggering the intercom request can be understood as triggering the end of the intercom request, that is to say, the intercom request is no longer issued to wait for the user to accept it. The intercom request process triggered by data analysis ends.

It can be understood that step S30 may include step S301 or step S302, therefore, step S10 and step S30 may include the following embodiments:

An example, a visiting intercom control method, including:

Step S101. Determine the image data collected in the intercom request process, and the image data includes the image data of the intercom requesting end and/or the image data of the intercom receiving end;

Step S301. When the image data satisfies the first preset condition, trigger the intercom request process to end;

Illustratively, an access control method includes:

Step S102. Determine the image data collected in the intercom process, and the image data includes the image data of the intercom requesting end and/or the image data of the intercom receiving end;

Step S302. When the image data satisfies the first preset condition, the intercom process is triggered to end.

Generally speaking, during the intercom request process, controlling the intercom request according to the image data of the intercom requester can prevent the visitor from leaving, etc., because the intercom request is still meaningless for a certain period of time, resulting in power consumption problems. Or the disturbance to the surrounding or indoor interviewed users, or the inconvenience caused by the need for visitors or interviewed users to manually shut down, etc. Of course, during the intercom request process, the intercom request can also be controlled according to the image data of the intercom receiver. For example, when the image of the receiving end satisfies the first preset condition, for example, does not include character feature information, etc., the intercom request is ended, thereby avoiding waiting for the visitor. Of course, during the intercom request process, the intercom request can also be controlled according to the image data of the intercom requesting end and the image data of the intercom receiving end, and the intercom request process can be triggered only when both meet the first preset condition. end, so as to avoid false triggers caused by the visited user being out of the monitoring range of the receiving end device or misjudging the characteristic information of the visitor or the visited user. In addition, during the intercom process, the intercom can also be controlled only according to the image data of the intercom requester, so as to avoid the visitor from continuing to be in the intercom request state after leaving; Avoid the presence of the interviewed user, but still keep the visitor waiting; or control the intercom by combining the image data of the intercom requester and the intercom receiver, so that the end of the intercom can be triggered only when both meet the first preset condition. , enhance the convenience of operation and the accuracy of control.

It can be understood that when the image data satisfies the first preset condition, it may be when a sampled frame image in the acquired image data satisfies the first preset condition (that is, the current sampled frame image satisfies the first preset condition), the trigger is triggered. The intercom request process ends or the intercom process is triggered. This method can trigger the end of the intercom relatively quickly, but there may be some misjudgments. For example, the visitor temporarily exceeds the range of the camera because of a certain posture change. When the image data satisfies the first preset condition, it may also be when the images of the consecutive preset sampling frames meet the first preset condition. In this case, the multi-sampled frame images all meet the first preset condition, thereby improving the judgment. accuracy. When the image data satisfies the first preset condition, it can also be that the image data for the first preset time continuously satisfies the first preset condition, that is to say, it is not measured by the number of frames, but by the duration. Improve the accuracy of judgment. In the case of no conflict, the above methods can be used in combination.

It should be noted that, the sampled frame image may be image data of some or all of the frames selected from the determined image data for analysis. Among them, the sampling period can be separated by a specific number of frames, and the specific number of frames can be a positive integer greater than or equal to zero, that is, every frame can be sampled and analyzed, or analyzed at specific frame intervals; of course, sampling can also be performed at specific time intervals. analyze.

It can be understood that if the image data satisfies the first preset condition, it may be that no character feature information is detected in the image data, or it may be that the preset behavior information of characters is detected in the image data, or The preset behavior information of the person is detected in the data, and then the person characteristic information is not detected in the image data, that is, if the preset time period after the preset behavior information of the person is detected in the image data, no information is detected in the image data. When the character feature information is reached, it is determined that the image data meets the first preset condition, which means that the character first changes the preset behavior and then leaves the monitoring area. The preset time period can be set according to actual needs. This is not limited. Wherein, the character feature information may include: at least one of face information, body contour information, and human body infrared information; the character preset behavior information includes: at least one of back-turn information, side-turn information, and distance information; wherein, The face information may include, but is not limited to, facial features information, skin color information, face contour information, pupil information, etc.; the human body contour information may be part of the human body contour or the entire human body contour, such as head contour, upper body contour, side contour, frontal contour, Back profile, etc. It can be understood that the backward turn information, the side turn information or the departure information can be obtained according to the changes of the face information and the body contour information. For example, the back turn information may be determined by detecting the change from the front profile to the side profile and then to the back profile, the side turn information may be determined by detecting the change from the front profile to the side profile, and the away information may be detected by detecting the back profile in the image. The proportional change is determined.

For example, the image acquisition device of the doorbell outdoor unit collects image data in real time during the intercom process. When no character feature information is detected in the collected image data, it can be considered that the visitor has left the collection range of the image acquisition unit, that is, the visitor. Leave, which automatically ends the intercom, without the need for the interviewed user or visitor to end the intercom. For example, by not detecting the face information, it can be quickly determined that the visitor is about to leave or has already left. For example, when the visitor turns around, the face information cannot be detected, so the intercom is automatically ended, which can avoid the user's manual operation, and at the same time, the fastest In particular, when the doorbell internal unit is integrated with other devices, such as a TV, the intercom can be ended as soon as possible, so that the interviewed users can continue to watch TV; another example is integrated in a speaker. On, you can end the intercom as soon as possible to facilitate the interviewed users to listen to music, etc. The human body contour information is used as the character feature information. When the human body contour information is not detected in the image data, it means that the visitor has left, so that it can be more accurately judged whether the intercom needs to end. , which may cause the end to be not timely enough, so it can also be judged by detecting the preset behavior information of the character, or combined with the character characteristic information and the preset behavior information of the character to make a comprehensive judgment, so that it can automatically To end the intercom request or end the intercom, there is no need to manually close the user or visitor, and it also avoids the problem of resource occupation caused by the intercom or intercom request still remaining after the visitor leaves or the interviewed user leaves.

It can be understood that the character feature information that is not detected in the image data may include at least one of the following: no character feature information is detected in a sampled frame image in the image data; continuous preset sampling frame number in the image data. No person feature information is detected in the image; no person feature information is detected in the image data for the first preset time. Among them, the collection of image data can be collected in real time. If no character feature information is detected in a sampled frame image of the image data, it can be understood that no character feature information is detected in the current sample frame image, which can trigger the end of the intercom request process or At the end of the intercom process, of course, the acquisition and analysis can be two processes, or even carried out by two devices. Therefore, there will be a certain time difference from the acquisition of the current image data to the completion of the analysis of the current image data, but generally the impact is small, and the determined (acquired) image data can be part or all of the acquired image data. Further, the sampled data for analysis may be part or all of the determined (acquired) image data.

In addition, the setting of the preset number of sampling frames and the first preset time can increase the accuracy of detection. The preset number of sampling frames can be completed by counting. After analyzing the image data of the continuous preset number of sampling frames, the count is re-counted. Of course, It can also start counting when it is determined that the current sampling frame image data does not detect the character feature information, and the first preset time can also refer to a similar method, of course, other methods can also be used, which are not limited here.

It can be understood that the preset behavior information of the person detected in the image data may include at least one of the following: it is detected that the outline of the human body changes from a frontal outline to a side outline (which can be considered as a A way to detect lateral turning information); in the image of the continuous sampling frame number in the image data, it is detected that the contour of the human body changes from the frontal contour to the side contour and then to the back contour (it can be considered as a way to detect the backward turning information). ); in the image of the continuous first preset sampling frame number in the image data, it is detected that the proportion of the human body contour in the image becomes smaller (it can be considered as a way to detect the distance information, which can detect the change trend, And can trigger the end action when the image that meets the first consecutive preset sampling frame number becomes smaller); in the image with the continuous sampling frame number in the image data, it is detected that the proportion of human silhouette in the image becomes smaller and smaller than the preset proportion (It can be considered as a way to detect the distance information); the proportion of the human body contour detected in the image of the second consecutive preset sampling frame number in the image data to the total human body contour increases (it can be considered as a way to detect the distance information) , this method can detect the change trend, and can trigger the end action when the proportion of the image with the second consecutive preset sampling frame number increases); in the image of the continuous preset sampling frame number in the image data, it is detected that the contour of the human body accounts for all the human body The scale of the contour increases and is larger than a preset scale (which can be considered as a way to detect far-away information).

It should be noted that when analyzing the image data, sampling can be performed once every specific number of frames, which can be a positive integer greater than or equal to zero; Each frame of image is analyzed, and part of the frame image can also be sampled. Since a character's behavior may be a continuous action, it needs to be determined by analyzing the image data of consecutively sampled frames. For example, the image data of the continuous sampling frame number is detected, and when the preset character behavior is detected, the end of the intercom process or the end of the intercom request process is triggered. When a frame is a back profile, the end of the intercom process or the end of the intercom request process is triggered. In the behavior of moving away, it can be determined by the proportion of the human body contour in the captured image. Generally speaking, when moving away, the area contained by the human body contour will become smaller in the total area of the image. Specifically, it can be reduced in the above Basically, when the occupied area is less than the preset proportion, the end of the intercom process or the end of the intercom request process is triggered; specifically, on the basis of the above reduction, the number of frames can be counted to become smaller, for example, the number of frames becomes smaller from the beginning. The sampling frame count starts to count, and the related end action is triggered when the smaller sampling frame number reaches the first preset sampling frame number. From another angle, since the camera may be fixed when the person is far away, the outline of the human body that can be photographed will increase. For example, when the person is close, only the outline of the head can be photographed, and the outline of the upper body can be photographed further away. The entire outline of the human body can be photographed, so the distance of the person can be judged in this way. When such a trend is detected, the end intercom or intercom request can be triggered. When the ratio is reached, the end of the intercom process or the end of the intercom request process is triggered. You can also count the number of sampling frames that increase the proportion, for example, start counting from the sampling frame that increases the proportion. When the corresponding end action is triggered. It can be seen that when judging the behavior of the character, the sampling frame number is used. Since the behavior of the character is relatively complex, directly using the collected continuous frame images may increase the misjudgment. In addition, the detection effect may be related to the setting position and setting method of the image acquisition module, so in the case of no conflict, the above methods can be combined for detection to improve the applicability.

It can be understood that the contour can be the head, upper body, whole body, etc., which needs to be determined according to the situation that the image acquisition device can capture, such as the setting position of the camera and the viewing angle width of the camera. In addition, the front profile, back profile or side profile can be determined according to the face information, especially the large front profile may be similar to the back profile, so in order to distinguish, it can also be distinguished by combining facial information such as facial features. Of course, the side turn information, the back turn information, and the distance information can also be determined in other ways. It should be noted that, by detecting the side turn information, it will trigger a request to terminate the intercom or terminate the intercom, which can respond quickly, but it will also increase misjudgment. Adjusted the posture, rather than trying to end the intercom or intercom request; but by detecting the back turn information, to trigger the termination intercom request or terminate the intercom, the reliability is stronger than the side turn information, and it is further detected that the distance information is large. The probability indicates that the visitor leaves to close the intercom or the intercom request is relatively more accurate. However, the above differences in effects are relative and may differ in some other scenarios.

It can be understood that when the image data includes the image data of the intercom requesting end and the image data of the intercom receiving end, if the image data satisfies the first preset condition, triggering the end of the intercom process or triggering the end of the intercom request process may include: when When the image data of the intercom requesting end satisfies the first preset condition and the image data of the intercom receiving end satisfies the first preset condition, the intercom process is triggered to end or the intercom request process is triggered to end. There may be various specific implementations of the first preset condition. Therefore, it can be understood that the first preset condition satisfied by the image data of the intercom requesting end and the first preset condition satisfied by the image data of the intercom receiving end may be the same or different. That is to say, the specific conditions met by the two may be different, for example, the image data of the intercom requesting end satisfies that no person information is detected in the image data, while the image data of the intercom receiving end satisfies that the distance behavior is detected in the image data, etc.

For example, when no face information, human body contour information, or human body infrared information is detected in the image data, it can be considered that the person on the intercom side has left. The captured image data is monitored. When face information is detected in a certain sampling frame image or a preset sampling frame number image, the end of the intercom process is triggered. The preset sampling frame number image can also prevent visitors from talking during the intercom process. Misjudgment occurs due to temporary movement. In this regard, an appropriate preset sampling frame number can be set in combination with the accuracy of the judgment and the timeliness of the trigger.

In practical applications, the image data of the intercom requesting end is generally collected by the intercom requesting end device; the image data of the intercom receiving end is collected by the intercom receiving end device. The docking requester device can generally be installed outdoors to obtain visitor information, while the intercom receiver device can be installed indoors. It can be a specialized device, such as a doorbell, or integrated in other electronic devices. There can be one or more than one machine, speakers, etc. Furthermore, acquisition of image data and analysis of image data may be performed by different devices.

It should be noted that among the above-mentioned intercom requesting end devices and intercom receiving end devices, traditional routers, gateways, CPEs, speakers or TV boxes may not have image capture functions, but some of the above products It is also possible to integrate image acquisition functions and even image display functions, such as speakers with cameras, speakers with displays, etc. That is to say, the intercom receiver equipment listed above may or may not have the image acquisition function. When it does not have the image acquisition function, if you need to obtain the image data of the intercom receiver, you can use the Other devices with image acquisition function to collect the image data of the intercom receiver, such as smart cameras. Therefore, if the image data of the intercom receiving end needs to be used in some scenarios, it can be considered that the above-mentioned intercom requesting end device or intercom receiving end device has the image acquisition function or can obtain the intercom receiving end from the device with the image acquisition function. image data.

In practical applications, the playback mode of the image data of the intercom requesting end can be determined according to the character information data or the device state data determined by the intercom receiving end device. Specifically, it may include but is not limited to the following methods: when it is determined that the TV is in the running state, it is determined that the image data of the intercom requesting terminal is played through the TV in a picture-in-picture manner; in this case, the viewing of the interviewed users can be reduced. Influence of TV; in one case, when it is determined that the TV is in running state, and it is determined that the interviewed user and the TV are within the preset range, it is determined that the image data of the intercom requester is played through the TV in a picture-in-picture manner; In this case, the positional relationship between the interviewed user and the TV is also considered, so as to infer whether the interviewed user can know the intercom request or whether it is convenient for the interviewed user to intercom. The preset range can be determined according to the interviewed user himself. It needs to be set, or a range can be defaulted at the factory, for example, the default is within the range that can be captured by the TV camera, that is to say, whether there is an interview is detected by the image data captured by the TV camera (such as the image data of the intercom receiver). User information, if any, is considered to be within the preset range. In one case, when it is determined that the TV is in the off state, it is determined that the image data of the intercom requester is played through the TV in a full-screen display; Display the image. If the TV is in a completely off state, start the TV to make the TV run, and play the image data of the intercom request terminal through the TV in a full-screen display mode. If the TV is in a sleep state, turn the TV on. Wake up to make the TV in the running state, and play the image data of the intercom requester through the TV in a full-screen display mode. Therefore, the image data of the intercom requesting terminal can be displayed in a full-screen mode, and it will not affect the interviewed users watching TV. In one case, when it is determined that the TV is turned off, and the interviewed user and the TV are determined to be within the preset range, it is determined that the image data of the intercom requesting end is played through the TV in a full-screen display mode; this method is similar to the previous method. Considering the positional relationship between the interviewed user and the TV set, details are not repeated here. In addition to the TV scene, the intercom receiver device can also be a mobile phone. In one case, when it is determined that the mobile phone is in use by the interviewed user, it is determined to play the image data of the intercom requesting end through the mobile phone; in this case, the interviewed user can be detected by To judge whether the interviewed user is using the mobile phone by touching the screen of the mobile phone or whether it is playing audio and video data. Of course, the image data captured by the mobile phone camera can also be used to determine whether the interviewed user is using it. If the interviewed user information is obtained, it is determined that the interviewed user is using a mobile phone. It can also be a computer scenario, including home computers, tablet computers, etc. When it is determined that the computer is in the use state of the interviewed user, it is determined to play the image data of the intercom requester through the computer, and the use state can be determined by means of a mobile phone or a TV. Here No longer. The above methods can be used in combination without conflict to increase the applicability for different scenarios. Therefore, when the interviewed user is using a certain intercom receiver device, the image data can be played through the intercom receiver device first, so as to quickly remind the interviewed user that there is a visitor, or it is more convenient for the user to play the image data. Intercom etc.

Based on the technical solutions described above, in order to further facilitate the use of the user, the above-mentioned visiting intercom control method further includes: S20. Determine the intercom voice collected during the intercom process. In the intercom process, step S30. The above-mentioned when the image data meets the first preset condition, triggering the end of the intercom process or triggering the intercom request process to end, can be replaced with step S40: when the image data meets the first preset condition and When the intercom voice satisfies the second preset condition, the intercom process is triggered to end. That is to say, in addition to the judgment based on the image data, the judgment is also combined with the intercom voice data, thereby increasing the accuracy of the judgment. Here, the intercom voice satisfies the second preset condition, which may include: the intercom voice includes a preset keyword or the intercom voice is not acquired for a preset time. For example, when it is detected that there are preset keywords such as "end intercom" and "goodbye" in the speech, it is considered that the intercom speech satisfies the second preset condition, or, after the two parties communicate with each other, there is no such thing as a preset time. If the intercom voice is acquired, it may be that there is no voice signal in the collected sound signal, or it may not be able to collect any sound signal, or it may be that the voice signal of both parties in the intercom cannot be collected, that is to say, it is possible to identify the two parties who have just spoken to each other. The voice information, so as to determine whether the two sides stop the intercom, this method can improve the accuracy in the case of a noisy voice environment.

It can be understood that, in order to improve the portability of use, taking the TV as the receiving end device as an example, during the intercom request process, if the TV or the remote control of the TV collects the data (such as the microphone on the TV or the remote ) the first preset voice of the interviewed user at the intercom receiver, triggers the intercom process; or, during the intercom process, if the TV or the remote The microphone collects) the second preset voice of the interviewed user at the intercom receiving end, which triggers the end of the intercom. Among them, the first preset voice can be used as an intercom request, thereby triggering the entry into the intercom stage. For example, it is detected that the interviewed user said preset voices such as "receive intercom" and "open intercom", and the specific content is not limited here. The second preset voice may be used to end the intercom, and may be a voice containing preset keywords such as "end the intercom" and "goodbye". The method is controlled by voice, which is convenient for the user to control the intercom. It should be noted that, in this solution, it is clarified that the user's voice at the receiving end collected by the TV or TV remote control is not the voice of the intercom, so as to avoid being affected by the voice of the visiting customer. The second preset voice used to end the intercom or the intercom request may be the interviewed user's voice collected by the TV or the remote control, or the intercom voice. The description of the second preset condition" will not be repeated here.

It should be noted that since the electronic device can be an intercom requesting end device, an intercom receiving end device, or a cloud server, that is to say, steps S10 and S30 can both be performed by the intercom requesting end device, for example The image acquisition unit of the intercom requesting end device can collect the image data of the requesting end during the intercom request or the intercom process, and the image data of the receiving end can be collected by the image acquisition unit of the intercom receiving end during the intercom request or the intercom process. Then the intercom requesting end device (such as its processor) determines (can be understood as acquiring) the image data (including the image data of the intercom requesting end and/or the image data of the intercom receiving end), and when the intercom requesting end device determines When the image data meets the first preset condition, the intercom request process is triggered or the intercom process ends. The trigger here can be the end of direct control, or a signal is sent to the intercom receiver device, which is controlled by the intercom receiver device. End, can belong to the protection scope of the trigger, there is no limit to this. Optionally, both steps S10 and S30 may be performed by the intercom receiving end device, for example, the image acquisition unit of the intercom requesting end device may collect the image data of the requesting end during the intercom request or the intercom process, which may be performed by the intercom requesting end device. The image acquisition unit of the receiving end collects the image data of the receiving end during the intercom request or the intercom process, and then the intercom receiving end device (such as its processor) determines (can be understood as acquiring) the image data (including the image of the intercom requesting end). data and/or image data of the intercom receiver), and when the intercom receiver device determines that the image data meets the first preset condition, it triggers the end of the intercom request process or the end of the intercom process, where the trigger can be direct control end, or send a signal to the intercom requester device, which is controlled by the intercom requester device to end, all of the above can belong to the protection scope of the trigger, and there is no restriction on this. Optionally, both steps S10 and S30 may be performed by a cloud server. For example, the image acquisition unit of the intercom requesting end device may collect the image data of the requesting end during the intercom request or the intercom process, and the image data of the intercom receiving end may be collected. The acquisition unit collects the image data of the receiving end during the intercom request or the intercom process, and then the cloud server determines (can be understood as acquiring) the image data (including the image data of the intercom requesting end and/or the image data of the intercom receiving end), And when the cloud server determines that the image data meets the first preset condition, it triggers the end of the intercom request process or the end of the intercom process. The trigger here can be sending a signal to the intercom requesting end device or sending a signal to the intercom receiver. The end device is controlled by the intercom requesting end device or the intercom receiving end device. All of the above can belong to the protection scope of the trigger, and there is no restriction on this. It should be noted that, since the electronic device may include at least one of a talk requesting end device, an intercom receiving end device, and a cloud server, steps S10 and S30 may be performed on the same or different electronic devices.

It can be seen that the image data collected during the intercom process or the intercom request process is determined, and the image data includes the image data of the intercom requesting end and/or the image data of the intercom receiving end; when the image data satisfies the first preset condition, Triggering the end of the intercom process or triggering the end of the intercom request process. Therefore, manual operation by the user is not required, which is convenient for the user to use. Further, by setting the first preset condition, the timeliness of ending the intercom request or ending the intercom can be improved, resources can be released in time, and the impact on the user's use of other devices can be reduced. Wait.

Referring to FIG. 2, the present application also provides another method for controlling a visiting intercom, which is applied to an electronic device, and the method for controlling a visiting intercom includes:

S100. Determine the image data collected in the intercom process and the intercom voice collected during the intercom process, the image data includes the image data of the intercom requesting end and/or the image data of the intercom receiving end; the intercom voice includes the voice of the intercom requesting end and/or the voice of the intercom receiver.

It should be noted that, on the basis of the previous embodiment, this embodiment combines the intercom voice and image data collected during the intercom process to comprehensively control the intercom. The accuracy is higher. The intercom voice can be collected in real time by the microphone of the intercom requesting end device, or by the microphone of the intercom receiving end device. Determine the intercom voice collected during the intercom process, which can be Determine part or all of the collected intercom speech. In addition, the image data satisfying the first preset condition may include: no character feature information is detected in the image data, and/or character preset behavior information is detected in the image data. The character feature information may include at least one of face information, human body contour information, and human body infrared information; and the character preset behavior information may include at least one of backward turn information, side turn information, and distance information. Among them, the backward turn information, the side turn information or the distance information can be obtained according to the change of the face information and the body contour information. It is understandable that no character feature information is detected in the image data, which may include but is not limited to:

(1) No character feature information is detected in a sampled frame image in the image data; or,

(2) No character feature information is detected in the image of the continuous preset sampling frame number in the image data; or,

(3) No person feature information is detected in the image data for the first preset time.

It can be understood that the preset behavior information of characters detected in the image data may include but not limited to:

(1) It is detected that the contour of the human body changes from the frontal contour to the side contour in the image of the consecutive sampling frames in the image data; or,

(2) It is detected that the human body contour changes from the frontal contour to the side contour and then to the back contour in the image of the consecutive sampling frames in the image data; or,

(3) It is detected that the proportion of the human body contour in the image in the image of the continuous first preset sampling frame number in the image data becomes smaller; or,

(4) In the image of the continuous sampling frame number in the image data, it is detected that the proportion of the human body contour in the image becomes smaller and smaller than the preset proportion; or,

(5) It is detected that the proportion of the contour of the human body to the total contour of the human body is increased in the image of the consecutive preset second sampling frame number in the image data; or,

(6) The proportion of the human body contour detected in the image of the continuous sampling frame number in the image data to the total human body contour increases.

It should be noted that, the front profile, the back profile or the side profile can be determined according to the face information. The specific manners in which no character feature information is detected in various image data exemplified above, or the specific manners in which person preset behavior information is detected in various image data, can be used in combination in the case of no conflict.

S300. When the image data satisfies the first preset condition and the intercom voice satisfies the second preset condition, trigger the end of the intercom process.

It should be noted that when the image data includes the image data of the intercom requesting end and/or the image data of the intercom receiving end, the image data satisfies the first preset condition, including: the image data of the intercom requesting end and/or the image data of the intercom receiving end The data satisfies the first preset condition. When the intercom voice includes the voice of the intercom requesting end and/or the voice of the intercom receiving end, the intercom voice satisfies the second preset condition, including: the voice of the intercom requesting end and/or the voice of the intercom receiving end satisfies the second preset condition condition.

It is understandable that the intercom voice satisfies the second preset condition, including: the intercom voice includes a preset keyword or the intercom voice is not acquired for a preset time. Wherein, the further explanation of the preset keywords and the unacquired intercom voice meeting the preset time can be found in the foregoing description, which will not be repeated here.

It can be understood that the electronic device includes: at least one of an intercom requesting end device, an intercom receiving end device, and a cloud server; optionally, the intercom requesting end device includes: a doorbell external unit, a camera, and an external access control unit. At least one; the intercom receiver equipment includes: doorbell internal machine, access control internal machine, TV, router, gateway device, customer premise equipment CPE (Customer Premise Equipment), speaker, smart camera, TV box, computer, mobile phone at least one of.

It can be understood that the image data of the intercom requesting end is collected by the intercom requesting end device; the image data of the intercom receiving end is collected by the device of the intercom receiving end.

It can be understood that the character information data and/or the device status data determined by the intercom receiving end device determine the playback mode of the image data of the intercom requesting end. Further, determine the playback mode of the image data of the intercom requester according to the character information data and/or device status data determined by the intercom receiver device, including but not limited to:

When it is determined that the TV is in the running state, it is determined that the image data of the intercom requesting terminal is played through the TV in a picture-in-picture manner; or,

When it is determined that the TV is in the running state, and it is determined that the interviewed user and the TV are within the preset range, it is determined that the image data of the intercom requesting terminal is played through the TV in a picture-in-picture manner; or,

When it is determined that the TV is in an off state, it is determined that the image data of the intercom requesting end is played through the TV in a full-screen display mode; or,

When it is determined that the TV is turned off, and it is determined that the interviewed user and the TV are within the preset range, it is determined to play the image data of the intercom requesting end through the TV in a full-screen display mode; or,

When it is determined that the mobile phone is in use by the interviewed user, it is determined to play the image data of the intercom requester through the mobile phone; or,

When it is determined that the computer is in the use state of the interviewed user, it is determined that the image data of the intercom requesting end is played through the computer.

It should be noted that the specific playback modes exemplified above can be used in combination with each other without conflict.

It is understandable that during the intercom request process, if the TV or the remote control of the TV collects the first preset voice of the interviewed user at the intercom receiving end, the intercom process is triggered; or, during the intercom process. , if the TV or the remote control of the TV collects the second preset voice of the interviewed user at the intercom receiving end, it triggers the end of the intercom. The first preset voice and the second preset voice refer to the previous explanation, and are not repeated here.

This embodiment is aimed at the control of the intercom process, mainly for the control of the intercom process. If the technical features in some technical solutions are not further explained or the related technical effects are not described, please refer to the descriptions in the previous relevant parts, which will not be repeated here. The technical solution in this embodiment considers not only the image data but also the intercom voice, and the intercom voice is added to control the intercom, which requires both the image data to satisfy the first preset condition and the intercom voice to satisfy the second preset condition , thus further improving the accuracy of the control.

It can be understood that, referring to FIG. 3 , the present application also provides an electronic device 500, comprising: one or more processors 510; a memory 520; one or more application programs, wherein one or more application programs are stored in the memory In 520 and configured to be executed by one or more processors 510, one or more programs are configured to perform the method of any of the above.

It can be understood that, referring to FIG. 4 , the present application also provides a computer-readable storage medium 600, where program codes 610 are stored in the computer-readable storage medium 600, and the program codes 610 can be called by the processor to execute any one of the above. item method.

Referring to FIG. 5 , the present application further provides a visiting intercom control device 1000 , and the visiting intercom control device 1000 includes:

Determining unit 1010, configured to determine the image data collected in the intercom process or the intercom request process, wherein the image data includes the image data of the intercom requesting end and/or the image data of the intercom receiving end;

The triggering unit 1020 is configured to trigger the end of the intercom process or the end of the intercom request process when the image data satisfies the first preset condition.

It can be understood that the triggering unit 1020 includes a detection module configured to determine that the image data satisfies the first preset condition; further, the detection module is configured to determine that the image data does not detect character feature information, and/or the detection unit is configured to determine that a person preset behavior information is detected in the image data. Wherein, the character feature information includes: at least one of face information, human body contour information, and human body infrared information; and/or, the character preset behavior information includes: at least one of back-turn information, side-turn information, and distance information . Further, the backward turn information or the side turn information or the far away information is obtained according to the change of the face information and the body contour information.

It can be understood that the detection module can also be configured to determine that a frame image in the image data has not detected the character feature information; the detection module can also be configured to determine that the image of the continuous preset sampling frame number in the image data has not been detected. The detection module may also be configured to determine that no person feature information is detected in the image data for a first preset time. It can be understood that the detection module can also be configured to determine that the contour of the human body has changed from a frontal contour to a side contour in the image of the number of consecutive sampling frames in the image data; the detection module can also be configured to determine the continuous sampling in the image data. It is detected that the human body contour changes from the frontal contour to the side contour and then to the back contour in the image of the sampling frame number; The proportion of the contour in the image becomes smaller; the detection module may be further configured to determine that the proportion of the detected human contour in the image of the second consecutive preset sampling frame number in the image data increases in the proportion of the total contour of the human body. Wherein, the front profile, the back profile or the side profile can be determined according to the face information.

It can be understood that when the image data includes the image data of the intercom requesting end and/or the image data of the intercom receiving end, the triggering unit 1020 can be configured to be used when the image data of the intercom requesting end and/or the image data of the intercom receiving end satisfy Under the first preset condition, the triggering of the intercom process ends or the triggering of the intercom request process ends.

It can be understood that the visiting intercom control device can be applied to electronic devices, and the above electronic device 500 includes, but is not limited to, at least one of an intercom requesting end device, an intercom receiving end device, and a cloud server. Among them, the intercom requesting terminal equipment includes but is not limited to: at least one of the doorbell outdoor unit, the camera, and the access control outdoor unit; the intercom receiving end equipment includes but is not limited to: the doorbell indoor unit, the access control indoor unit, TV, router, At least one of gateway equipment, Customer Premise Equipment (CPE), speakers, smart cameras, TV boxes, computers, and mobile phones. Since the intercom control device can be applied to electronic equipment, and the electronic device 500 may be one type of device, or multiple devices (two or more), therefore, each unit and module in the intercom control device Can be applied to different electronic devices without conflict.

It can be understood that the image data of the intercom requesting end is collected by the intercom requesting end device; the image data of the intercom receiving end is collected by the device of the intercom receiving end. The intercom control device further includes a playback control unit, which can be configured to determine the playback mode of the image data of the intercom requester according to the character information data and/or device status data determined by the intercom receiver device. The intercom control device can be applied to the intercom requesting end. Although the image data of the intercom requesting end needs to be played on the intercom receiving end, when there are multiple request receiving end devices, the intercom requesting end device can be used to determine which device is used. Intercom receiver device to play. The same applies to the cloud server, which will not be repeated here. When the intercom control device is applied to the intercom receiving end device, the intercom receiving end device determines the playback mode of the image data of the intercom requesting end.

It can be understood that the playback control unit can be configured to play the image data of the intercom request terminal through the TV in a picture-in-picture manner when it is determined that the TV is in a running state; the playback control unit can be configured to When it is determined that the TV is in the running state, and it is determined that the interviewed user and the TV are within the preset range, the image data of the intercom requesting terminal is played through the TV in a picture-in-picture manner; the playback control unit can be configured to When it is determined that the TV is turned off, the image data of the intercom requesting terminal is played through the TV in a full-screen display mode; the playback control unit can be configured to determine whether the TV is turned off, and to determine whether the interviewed user and the TV are related to each other. If the mobile phone is within the preset range, the image data of the intercom requesting terminal is played in a full-screen display mode through the TV; the playback control unit can be configured to determine that the mobile phone is in the use state of the interviewed user, and then determine to play the intercom request through the mobile phone. and the playback control unit can be configured to play the image data of the intercom requester through the computer when it is determined that the computer is in the use state of the interviewed user.

It can be understood that the intercom control device further includes a start intercom control unit, which is configured to be configured to, during the intercom request process, if the TV or the remote control of the TV collects the first preset of the interviewed user at the intercom receiving end. If the voice is set, it will trigger to enter the intercom process; alternatively, the trigger unit 1020 can also be configured to be used for, during the intercom process, if the TV or the remote control of the TV collects the second data of the interviewed user at the intercom receiving end The preset voice will trigger the end of this intercom. It should be noted that the intercom control device can be applied to a TV set, and can also be applied to other electronic devices.

It can be understood that during the intercom process, the triggering unit 1020 can be configured to trigger the end of the intercom process when the image data satisfies the first preset condition and the intercom voice satisfies the second preset condition. Further, the triggering unit 1020 may be configured to determine that the intercom speech includes a preset keyword or that the intercom speech has not been acquired for a preset time.

It should be noted that the above-mentioned visiting intercom control device can intelligently determine whether to trigger the end intercom request or end the intercom according to the image data of the intercom requesting end and/or the intercom receiving end, so that manual operation is not required, which is convenient for users. Use, further, can also be combined with voice control to further facilitate the user's intercom operation. For the technical effect of the device that is not detailed, reference may be made to the related method section, which will not be repeated here.

The embodiment of the present application further provides another visiting intercom control device, the intercom control device includes: a determination unit configured to determine the image data collected during the intercom process and the intercom voice collected during the intercom process, wherein , the image data includes the image data of the intercom requesting end and/or the image data of the intercom receiving end, and the intercom voice includes the voice of the intercom requesting end and/or the voice of the intercom receiving end;

The triggering unit 1020 is configured to trigger the end of the intercom process when the image data satisfies the first preset condition and the intercom voice satisfies the second preset condition.

The intercom control device corresponds to a related intercom control method, and other parts refer to the method part, which will not be repeated here.

In order to explain the various embodiments or implementations of the present application more clearly, please refer to FIG. 6, which further illustrates a visiting intercom system 2000, which can provide convenient intercom for the visiting user 2010 and the interviewed user 2020. Control the experience. The system 2000 includes a doorbell device 2100 and a TV 2200. The doorbell device 2100 is connected to the TV 2200, either by wired connection or wireless connection. The wireless connection can be connected through a common network or directly through Bluetooth, etc. At least one of them may also be connected to a cloud server (not shown), and may include other intercom receiver devices 2300, such as tablet computers, mobile phones, and the like, in addition to the television set. As shown in Figure 7, the system can be used to perform the following steps:

S200. The doorbell device 2100 or the TV 2200 determines the image data collected during the intercom process or the intercom request process, and the image data includes the image data of the intercom requester collected by the doorbell device 2100 and/or the intercom reception collected by the TV 2200 end image data;

S400. When the image data satisfies the first preset condition, the doorbell device 2100 or the TV 2200 triggers the end of the intercom process or triggers the end of the intercom request process.

It can be understood that the image data satisfying the first preset condition further includes: no character feature information is detected in the image data, and/or character preset behavior information is detected in the image data. The character feature information includes: at least one of face information, body contour information, and human body infrared information; and the character preset behavior information includes: at least one of backward turn information, side turn information, and distance information. The backward turn information or the side turn information or the far away information is obtained according to the change of the face information and the body contour information. No character feature information is detected in the image data, including at least one of the following: no character feature information is detected in a frame image in the image data; no character feature information is detected in an image with a continuous preset sampling number of frames in the image data; an image The person characteristic information is not detected in the data for the first preset time. The preset behavior information of the person detected in the image data includes at least one of the following: it is detected that the contour of the human body changes from the frontal contour to the side contour in the image of the continuous sampling frame number in the image data; the image of the continuous sampling frame number in the image data It is detected that the contour of the human body changes from the frontal contour to the side contour and then to the back contour; the proportion of the human contour detected in the image with the first consecutive preset sampling frames in the image data becomes smaller; The proportion of the human body contour detected in the image with the continuous sampling frame number becomes smaller and smaller than the preset proportion; the proportion of the human body contour detected in the image with the continuous preset sampling frame number in the image data to the total human body contour increases. ; It is detected that the proportion of the contours of the human body to the total contours of the human body increases and is greater than the preset proportion in the images of the consecutive sampling frames in the image data. Wherein, the front profile, the back profile or the side profile can be determined according to the face information.

It can be understood that when the image data includes the image data collected by the doorbell device and/or the image data collected by the TV, when the image data satisfies the first preset condition, the doorbell device or TV triggers the intercom process to end or triggers the intercom. The request process ends, including: when the image data collected by the doorbell device and/or the image data collected by the TV meet the first preset condition, the doorbell device or the TV triggers the intercom process to end or the trigger intercom request process ends.

The above system may further include a cloud server, and the cloud server may be used to determine that the image data satisfies the first preset condition. Of course, it can also be determined by the doorbell device or the TV set to determine that the image data meets the first preset condition. In addition to the cloud server, it can even be determined by other electronic devices and then notified to the doorbell device or TV set to trigger the intercom request. The end of the process or the end of the intercom process is not limited here.

Further, the doorbell device or the TV determines the playback mode of the image data collected by the doorbell device according to the character information data and/or the TV state data determined by the TV. The playback method of the image data of the intercom requesting terminal is determined according to the character information data and/or device status data determined by the TV, including but not limited to:

When it is determined that the TV is in an off state, and it is determined that the interviewed user and the TV are within a preset range, it is determined that the image data of the intercom requesting end is played through the TV in a full-screen display manner.

It is understandable that during the intercom request process, if the TV or the remote control of the TV collects the first preset voice of the interviewed user at the intercom receiver, the doorbell device or the TV will trigger the intercom process; or, During the intercom process, if the TV or the remote control of the TV collects the second preset voice of the interviewed user at the intercom receiver, the doorbell device or the TV triggers the end of the intercom.

It can be understood that when the image data satisfies the first preset condition and the intercom voice satisfies the second preset condition, the doorbell device or the TV set triggers the intercom process to end. Wherein, the intercom voice satisfies the second preset condition, including: the intercom voice includes a preset keyword or the intercom voice is not acquired for a preset time.

It can be seen that, in the embodiment provided by this application, the image data collected during the intercom process or the intercom request process is determined, and the image data includes the image data of the intercom requesting end and/or the image data of the intercom receiving end; Under the first preset condition, the triggering of the intercom process ends or the triggering of the intercom request process ends. Therefore, manual operation by the user is not required, which is convenient for the user to use. Further, by setting the first preset condition and the second preset condition, the accuracy of the request for ending the intercom or the end of the intercom can be improved, and the resources can be reasonably released to reduce the The impact on the user's use of other devices, etc.

It should be noted that, for the sake of simple description, the foregoing method embodiments are all expressed as a series of action combinations, but those skilled in the art should know that the present application is not limited by the described action sequence. Because in accordance with the present application, certain steps may be performed in other orders or concurrently. Secondly, those skilled in the art should also know that the actions and modules involved in the embodiments described in the specification are not necessarily required by the present application.

In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments. In the several embodiments provided in this application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the device embodiments described above are only illustrative. For example, the division of the above-mentioned units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical or other forms. The units described above as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment. In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units. The above-mentioned integrated units, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable memory. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art, or all or part of the technical solution, and the computer software product is stored in a memory, Several instructions are included to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the above-mentioned methods in the various embodiments of the present application. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes. Those skilled in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable memory, and the memory can include: a flash disk , Read-only memory (English: Read-Only Memory, referred to as: ROM), random access device (English: Random Access Memory, referred to as: RAM), magnetic disk or optical disk, etc.

The embodiments of the present application have been introduced in detail above, and the principles and implementations of the present application are described in this paper by using specific examples. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present application; at the same time, for Persons of ordinary skill in the art, based on the idea of the present application, may have changes in the specific implementation manner and application scope. In conclusion, the contents of this description should not be construed as a limitation on the present application.

Claims

A visiting intercom control method, characterized in that, applied to electronic equipment, the method comprising:

Determine the image data collected during the intercom process or the intercom request process, and the image data includes the image data of the intercom requesting end and/or the image data of the intercom receiving end;

When the image data satisfies the first preset condition, the intercom process is triggered to end or the intercom request process is triggered to end.
The method according to claim 1, wherein the image data collected during the intercom process or the intercom request process is determined, and the intercom process is triggered when the image data satisfies a first preset condition End or trigger the end of the intercom request process, including:

Determine the image data collected during the intercom;

When the image data satisfies the first preset condition, the intercom process is triggered to end.
The method according to claim 1, wherein the image data collected during the intercom process or the intercom request process is determined, and the intercom process is triggered when the image data satisfies a first preset condition End or trigger the end of the intercom request process, including:

Determine the image data collected during the intercom request;

When the image data satisfies the first preset condition, the process of triggering the intercom request ends.
The method according to any one of claims 1-3, wherein the image data satisfying the first preset condition comprises:

No human characteristic information is detected in the image data.
The method according to any one of claims 1-3, wherein the image data satisfying the first preset condition comprises:

Character preset behavior information is detected in the image data.
The method according to any one of claims 1-3, wherein the image data satisfying the first preset condition comprises:

No character feature information is detected in the image data, and character preset behavior information is detected in the image data.
The method according to claim 4 or 6, wherein the character feature information comprises: at least one of human face information, human body contour information, and human body infrared information.
The method according to claim 5 or 6, wherein the character preset behavior information includes at least one of backward turn information, side turn information, and away information.
The method of claim 8, wherein:

The backward turn information or the side turn information or the distance information is obtained according to the change of the face information and the body contour information.
The method according to claim 4 or 6, wherein no character feature information is detected in the image data, comprising:

Character feature information is not detected in a sample frame image in the image data; or,

No person feature information is detected in the image with the preset sampling frame number in the image data; or,

No person feature information is detected in the image data for a first preset time.
The method according to claim 5 or 6, wherein the preset behavior information of a person is detected in the image data, comprising:

It is detected that the contour of the human body changes from a frontal contour to a side contour in the image of the consecutively sampled frames in the image data; or,

It is detected that the contour of the human body changes from a frontal contour to a side contour and then to a back contour in the image of the consecutive sampling frames in the image data; or,

It is detected that the proportion of the human body contour in the image becomes smaller in the image of the continuous first preset sampling frame number in the image data; or,

In the image of the continuous sampling frame number in the image data, it is detected that the proportion of the human body contour in the image becomes smaller and smaller than the preset proportion; or,

It is detected that the proportion of the outline of the human body to the total outline of the human body increases in the images of the second consecutive preset sampling frames in the image data; or,

It is detected that the proportion of the contours of the human body to the entire contours of the human body increases and is greater than a preset proportion in the images of the consecutively sampled frames in the image data.
The method of claim 11, wherein:

The front profile or the back profile or the side profile is determined according to face information.
The method according to claim 1, wherein when the image data includes the image data of the intercom requesting end and the image data of the intercom receiving end, the when the image data satisfies a first preset condition , triggering the end of the intercom or triggering the end of the intercom request, including: when the image data of the intercom requesting end satisfies the first preset condition and the image data of the intercom receiving end satisfies the first preset condition When setting the conditions, trigger the end of the intercom process or trigger the end of the intercom request process, the first preset condition that is satisfied by the image data of the intercom requesting end and all that are satisfied by the image data of the intercom receiving end. The first preset conditions are the same or different.
The method according to any one of claims 1-13, wherein the electronic device comprises: at least one of an intercom requesting end device, an intercom receiving end device, and a cloud server;
The method according to claim 14, wherein the image data of the intercom requesting end is collected by the intercom requesting end device; the image data of the intercom receiving end is collected by the device of the intercom receiving end.
The method according to claim 14, wherein the method further comprises: determining a playback mode of the image data of the intercom requester according to the character information data and/or device status data determined by the intercom receiver device .
The method according to any one of claims 14-16, wherein the intercom requesting terminal device comprises: at least one of a doorbell outdoor unit, a camera, and an access control outdoor unit; the intercom receiving terminal device comprises: At least one of the doorbell indoor unit, access control indoor unit, TV, router, gateway device, customer premise equipment (CPE), speaker, smart camera, TV box, computer, and mobile phone.
The method according to claim 17, wherein, determining the playback mode of the image data of the intercom requester according to the character information data and/or device status data determined by the intercom receiver device, comprising:

When it is determined that the television is in the running state, it is determined that the image data of the intercom requester is played by the television in a picture-in-picture manner; or,

When it is determined that the television set is in a running state, and it is determined that the interviewed user and the television set are within a preset range, it is determined that the image data of the intercom requesting terminal is played through the television set in a picture-in-picture manner; or ,

When it is determined that the television is in an off state, it is determined that the image data of the intercom requester is played through the television in a full-screen display mode; or,

When it is determined that the television is in an off state, and it is determined that the interviewed user and the television are within a preset range, it is determined that the image data of the intercom requester is played through the television in a full-screen display manner; or,

When it is determined that the mobile phone is in the use state of the interviewed user, it is determined to play the image data of the intercom requester through the mobile phone; or,

When it is determined that the computer is in the use state of the interviewed user, it is determined that the image data of the intercom requesting end is played through the computer.
The method according to claim 17, wherein, in the intercom request process, if the television or the remote control of the television collects the first preset of the interviewed user of the intercom receiving end If the voice is set, it will trigger to enter the intercom process; or, during the intercom process, if the TV or the remote control of the TV collects the second preset voice of the interviewed user at the intercom receiver , it triggers the end of the intercom.
A visiting intercom control method, characterized in that, applied to electronic equipment, the method comprising:

Determine the image data collected in the intercom process and the intercom voice collected during the intercom process, wherein the image data includes the image data of the intercom requesting end and/or the image data of the intercom receiving end, and the intercom voice Including the voice of the intercom requester and/or the voice of the intercom receiver;

When the image data satisfies the first preset condition and the intercom voice satisfies the second preset condition, the intercom process is triggered to end.
The method according to claim 20, wherein the intercom voice satisfies a second preset condition, comprising:

The intercom voice includes preset keywords or the intercom voice is not acquired for a preset time.
A visiting intercom control device, characterized in that the intercom control device comprises:

a determining unit, configured to determine the image data collected during the intercom process or the intercom request process, the image data including the image data of the intercom requesting end and/or the image data of the intercom receiving end;

a triggering unit, configured to trigger the end of the intercom process or trigger the end of the intercom request process when the image data satisfies a first preset condition.
A visiting intercom system, characterized in that the system includes a doorbell device and a TV, the doorbell device is connected to the TV, and the doorbell device or the TV is configured to determine the intercom process. Or the image data collected in the intercom request process, the image data includes the image data of the intercom requesting end collected by the doorbell device and/or the image data of the intercom receiving end collected by the TV; Under the first preset condition, the doorbell device or the television set is configured to trigger the end of the intercom process or trigger the end of the intercom request process.
An electronic device, comprising:

one or more processors; memory;

One or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs are configured to perform such as The method of any one of claims 1-19 or 20-21.
A computer-readable storage medium, characterized in that the computer-readable storage medium stores program codes, and the program codes can be invoked by a processor to execute any one of claims 1-19 or 20-21 the method described.