CN111986700A - Method, device, equipment and storage medium for triggering non-contact operation - Google Patents

Method, device, equipment and storage medium for triggering non-contact operation Download PDF

Info

Publication number
CN111986700A
CN111986700A CN202010886923.7A CN202010886923A CN111986700A CN 111986700 A CN111986700 A CN 111986700A CN 202010886923 A CN202010886923 A CN 202010886923A CN 111986700 A CN111986700 A CN 111986700A
Authority
CN
China
Prior art keywords
target
action
image data
sound data
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010886923.7A
Other languages
Chinese (zh)
Inventor
陈文琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Fanxing Huyu IT Co Ltd
Original Assignee
Guangzhou Fanxing Huyu IT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Fanxing Huyu IT Co Ltd filed Critical Guangzhou Fanxing Huyu IT Co Ltd
Priority to CN202010886923.7A priority Critical patent/CN111986700A/en
Publication of CN111986700A publication Critical patent/CN111986700A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan

Abstract

The application discloses a method, a device, equipment and a storage medium for triggering non-contact operation, and belongs to the technical field of computers. The method comprises the following steps: acquiring sound data acquired by audio acquisition equipment, and determining target reference sound data matched with the sound data in stored reference sound data; in response to determining target reference sound data matched with the sound data, determining target actions and target operations corresponding to the target reference sound data based on the corresponding relation of the reference sound data, the actions and the operations; and detecting whether the target action exists in the acquired image data based on an action detection model, and executing the target operation if the target action exists. Can promote the simple operation nature poor through this application.

Description

Method, device, equipment and storage medium for triggering non-contact operation
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for contactless operation triggering.
Background
With the development of computer technology, the entertainment mode of people is changed greatly, and watching live broadcast becomes an important entertainment mode in people's life.
In order to enhance the interaction between the anchor and the audience, a technician sets various functions for the anchor to use, and when the anchor uses the functions, the anchor can control the opening and closing of the functions through a mouse or a keyboard. For example, when the anchor uses a lottery function, the anchor may control the start and end of the lottery through a mouse or a keyboard.
In the process of implementing the present application, the inventor finds that the prior art has at least the following problems:
the starting and ending of the function are controlled by the anchor through a mouse or a keyboard, but the method of contactless operation triggering is not suitable for all scenes, for example, when the anchor dances, the lottery is inconvenient to carry out, so that the anchor must stop dancing, and then the starting and ending of the lottery are controlled through the mouse or the keyboard, which causes poor operation convenience.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for triggering non-contact operation, and can solve the problem of poor practicability. The technical scheme is as follows:
in one aspect, a method for contactless operation triggering is provided, where the method includes:
acquiring sound data acquired by audio acquisition equipment, and determining target reference sound data matched with the sound data in stored reference sound data;
in response to determining target reference sound data matched with the sound data, determining target actions and target operations corresponding to the target reference sound data based on the corresponding relation of the reference sound data, the actions and the operations;
and when the target action is detected in the acquired image data based on the action detection model, executing the target operation.
Optionally, when it is detected that the target motion exists in the acquired image data based on the motion detection model, the target operation is performed, including:
acquiring image data acquired within a preset time period, and determining position information of human key points in the image data based on a key point extraction model;
determining a first action corresponding to the image data based on an action detection model and the position information of the human key points in the image data;
and when the first action is the same as the target action, executing the target operation.
Optionally, the position information of the human body key point includes position information of a head key point, the motion detection model includes a head motion detection model, the first motion includes a first head motion, and the target motion includes a target head motion;
alternatively, the first and second electrodes may be,
the human body key point position information comprises hand key point position information, the action detection model comprises a hand action detection model, the first action comprises a first hand action, and the target action comprises a target hand action.
Optionally, the position information of the human body key points includes position information of hand key points and position information of head key points, the motion detection model includes a hand motion detection model and a head motion detection model, the first motion includes a first hand motion and a first head motion, and the target motion includes a target hand motion and a target head motion;
the determining a first action corresponding to the image data based on the action detection model and the position information of the human key points in the image data includes:
and determining a first hand motion corresponding to the image data based on a hand motion detection model and the hand key point position information, and determining a first head motion corresponding to the image data based on a head motion detection model and the head key point position information.
Optionally, before determining the target action and the target operation corresponding to the target reference sound data based on the corresponding relationship between the reference sound data and the action and operation, the method further includes:
carrying out face detection on the acquired image data to obtain face image data in the image data;
sending a face image data request carrying an account identifier of a current login account to a server;
receiving reference face image data corresponding to the account identification sent by the server;
determining that the face image data matches the reference face image data.
Optionally, the target operation is to start a target function or to close the target function.
In another aspect, there is provided a contactless operation triggering apparatus, the apparatus comprising:
the acquisition module is used for acquiring sound data acquired by the audio acquisition equipment and determining target reference sound data matched with the sound data in the stored reference sound data;
the determining module is used for responding to the target reference sound data matched with the sound data, and determining a target action and a target operation corresponding to the target reference sound data based on the corresponding relation of the reference sound data, the action and the operation;
and the execution module is used for executing the target operation when the target action is detected in the acquired image data based on the action detection model.
Optionally, the execution module is configured to:
acquiring image data acquired within a preset time period, and determining position information of human key points in the image data based on a key point extraction model;
determining a first action corresponding to the image data based on an action detection model and the position information of the human key points in the image data;
and when the first action is the same as the target action, executing the target operation.
Optionally, the position information of the human body key point includes position information of a head key point, the motion detection model includes a head motion detection model, the first motion includes a first head motion, and the target motion includes a target head motion;
alternatively, the first and second electrodes may be,
the human body key point position information comprises hand key point position information, the action detection model comprises a hand action detection model, the first action comprises a first hand action, and the target action comprises a target hand action.
Optionally, the position information of the human body key points includes position information of hand key points and position information of head key points, the motion detection model includes a hand motion detection model and a head motion detection model, the first motion includes a first hand motion and a first head motion, and the target motion includes a target hand motion and a target head motion;
the determining module is configured to:
and determining a first hand motion corresponding to the image data based on a hand motion detection model and the hand key point position information, and determining a first head motion corresponding to the image data based on a head motion detection model and the head key point position information.
Optionally, the method further includes a matching module, where the matching module is configured to:
carrying out face detection on the acquired image data to obtain face image data in the image data;
sending a face image data request carrying an account identifier of a current login account to a server;
receiving reference face image data corresponding to the account identification sent by the server;
determining that the face image data matches the reference face image data.
Optionally, the target operation is to start a target function or to close the target function.
In yet another aspect, a computer device is provided, the computer device comprising a processor and a memory, the memory having instructions stored therein, the processor executing the instructions to cause the computer device to implement the method for contactless operation triggering.
In yet another aspect, a computer-readable storage medium is provided, the computer-readable storage medium storing instructions, execution of which by a computer device causes the computer device to implement the method for contactless operation triggering.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
this application is through prior evidence sound data and target benchmark sound data whether the same, through verifying the back, detects the image data who gathers and whether there is the target action, if exist, then carries out the target operation that target benchmark sound data correspond, and the user only needs to send corresponding sound like this and puts out corresponding action and just can control and carry out corresponding operation, need not to control through operation mouse and keyboard, has also promoted the operation convenience.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application;
FIG. 2 is a flowchart of a method for contactless operation triggering according to an embodiment of the present disclosure;
FIG. 3 is a schematic page diagram illustrating a method for contactless operation triggering according to an embodiment of the present application;
FIG. 4 is a schematic page diagram illustrating a method for triggering a contactless operation according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of a device for triggering contactless operation according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a terminal provided in an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The embodiment of the application provides a method for triggering non-contact operation, the non-contact operation triggering is that a user can trigger corresponding operation without operating hardware such as a keyboard and a mouse, and the method can be realized by a terminal. The terminal can be a mobile phone, a desktop computer, a tablet computer, a notebook computer, an intelligent wearable device and the like, and can be provided with controls such as a screen, a microphone and a camera. The terminal may have a function of displaying an image, a function of audio acquisition, and a function of image acquisition, and the terminal may be installed with an application program, for example, a live application program, a short video application program, and the like. It should be noted that, the description of the present solution taking the live broadcast application as an example is omitted here for brevity.
As shown in fig. 1, when the anchor uses a live application, the anchor may click a live control, and then the terminal acquires the permissions of the audio acquisition device and the image acquisition device, synthesizes sound data and image data acquired by the audio acquisition device and the image acquisition device into video data, and then sends the video data to the server, and the server acquires the terminal identifier of the audience in the live room of the anchor, determines the terminals of the audience according to the terminal identifiers of the audience, and then sends the video data to the terminals of the audiences.
Fig. 2 is a flowchart of a method for contactless operation triggering according to an embodiment of the present application. Referring to fig. 2, the process includes:
step 201, acquiring sound data acquired by an audio acquisition device, and determining target reference sound data matched with the sound data in the stored reference sound data.
In implementation, when the anchor broadcasts directly, the terminal may collect sound data based on the audio collection device, and the terminal may detect the obtained sound data to obtain a frequency of the sound data, and then the terminal may obtain a frequency corresponding to the internally stored reference sound data, and then compare the obtained frequency of the sound data with a frequency corresponding to the internally stored reference sound data, and if there is a case that the stored frequency of the reference sound data matches the frequency of the sound data, determine the reference sound data as the target reference sound data.
For example, when the anchor plays a palm in live broadcasting, the anchor performs a palm shooting to generate a palm shooting sound, the terminal may detect the palm shooting sound collected by the microphone to obtain a frequency of the palm shooting sound, the terminal may further obtain reference sound data stored inside, further compare the frequency of the palm shooting sound with the frequency of the reference sound data, and if the frequency of the reference sound data matching the frequency of the palm shooting sound is detected, determine the reference sound data as target reference sound data.
Here, the terminal may detect not only the frequency of the audio data but also the duration of the audio data, the pitch of the audio data, the number of beats of the audio data, and the semantic meaning of the audio data.
It should be noted that the above-mentioned detection of the collected sound data and the determination of the target reference sound data matching the sound data is a low-performance process, and the function of detecting the image data described below is in an off state. Only after the target reference sound data matched with the sound data is determined, the terminal can start the function of detecting the image data. This also has the effect of reducing the waste of computing resources.
Optionally, before the anchor performs live broadcasting, the anchor may set a function switch.
In implementation, the anchor can start a live application program, trigger an account control, and further the terminal can display an account setting page, where the account setting page can display an avatar of the anchor and a nickname of the anchor, and the avatar of the anchor and the nickname of the anchor can be displayed in an account information editing control, and when the account information editing control is triggered, the terminal can display an account information editing page, where the anchor can edit its own account information. A function switch control may be displayed in the account page, the anchor may click the function switch control, and then, the terminal may display the function switch page, and in the function switch page, an existing function switch list may be displayed, and the list may display reference audio, head motion, hand motion, function name, and status, as shown in fig. 3. An adding control is arranged below the function switch list, a user can trigger the adding control, and then the terminal can display a function switch adding interface, the user can sequentially add/select reference audio, head action and hand action in the function switch adding interface, select a function name and a state, and can set the interval duration of audio detection and action detection according to own habits and trigger the determining control, as shown in fig. 4, and then the anchor can complete the setting of the function switch.
It should be understood that the setting of the function switch is to generate the reference voice data, the corresponding relation of the action and the operation customized by the user, so that the terminal can execute the processing of step 202 described below.
It should be noted that the head movement may include a facial movement, such as blinking, opening mouth, etc., and is not limited herein.
The hand motion may include a static hand motion and a dynamic hand motion, such as an "OK" gesture and a "ring finger motion", and is not limited herein.
Step 202, in response to determining the target reference sound data matched with the sound data, determining the target action and the target operation corresponding to the target reference sound data based on the corresponding relationship between the reference sound data, the action and the operation.
The target action comprises a target hand action and/or a target head action, the terminal stores the corresponding relation of reference sound data, action and operation, and the target action is an identifier for identifying one action.
In implementation, after the terminal determines target reference sound data matched with the sound data collected by the audio collection device, the terminal may determine a target action and a target operation corresponding to the target reference sound data based on the stored correspondence between the reference sound data, the action and the function.
It should be noted that the target operation is to start the target function or to close the target function.
For example, if the terminal stores the correspondence between the reference audio, the head movement, the hand movement, the function, and the state, the terminal may specify the target reference audio corresponding to the target reference audio data after specifying the target reference audio data, and may specify the target head movement (corresponding to the head movement), the second hand movement (corresponding to the hand movement), the lottery (corresponding to the function), and the start (corresponding to the state) corresponding thereto.
It should be noted that the first action and the target action are both identifiers corresponding to some simple basic actions, so that the complexity of calculation can be reduced, and meanwhile, in order to ensure the detection accuracy and reduce the false positive rate, the scheme adopts a combination mode of detecting two simple actions to ensure the detection accuracy and reduce the false positive rate.
And step 203, when the target action exists in the acquired image data based on the action detection model, executing the target operation.
In implementation, after the terminal determines a target action and a target operation corresponding to the target reference sound data, first, the terminal may acquire image data acquired by the image acquisition device within a preset time duration.
The preset duration may be set manually or may be a default value of the system.
For example, after the anchor plays the palm, the terminal may acquire a head action nod, a gesture action "OK" gesture, and a target operation start lottery corresponding to the clapping sound, and then the user may nod within a preset duration and complete the "OK" gesture with a hand, and further, the terminal may collect image data of the user nod within the preset duration and complete the "OK" gesture with the hand through the image collection device.
It should be noted that, in this embodiment, the hand motion and the head motion may be dynamic motions or static motions, and are not limited herein.
Secondly, after the image data is acquired, the terminal can determine a first action corresponding to the image data through the key point extraction model and the action detection model, and the specific processing can be as follows:
firstly, determining the position information of the human body key points in the image data based on a key point extraction model.
The key point extraction model comprises a hand key point extraction model and a head key point extraction model, and the human body key point position information comprises hand key point position information and head key point position information.
In implementation, the terminal may input the image data into the hand key point extraction model and the head key point extraction model, and may output the hand key point position information and the head key point position information in the image data by calculating the hand key point extraction model and the head key point extraction model.
And secondly, determining a first action corresponding to the image data based on the action detection model and the position information of the key points of the human body in the image data.
The motion detection model comprises a hand motion detection model and a head motion detection model, and the first motion comprises a first hand motion and a first head motion.
In implementation, after acquiring the position information of the hand key point and the position information of the head key point, the position information of the hand key point may be input into the hand motion detection model, and the position information of the head key point may be input into the head motion detection model, and further, the hand motion detection model may determine the first hand motion corresponding to the image data, and the head motion detection model may determine the first head motion corresponding to the image data.
After the first action corresponding to the image data is determined, when the first action is the same as the target action, the target operation is executed.
In implementation, the first head action, the first hand action, the target head action and the target hand action can be obtained through the processing terminal. Furthermore, the terminal can confirm whether the first head action is the same as the target head action and the first hand action is the same as the target hand action, and if the first head action is the same as the target head action and the first hand action is the same as the target hand action, the terminal can execute the target operation.
For example, if the first head movement is nodding, the target head movement is nodding, the first hand movement is OK, and the target hand movement is OK, it is verified that the first head movement and the target head movement are the same, and the first hand movement and the target hand movement are the same, the process of starting the lottery is executed.
Optionally, in some special scenarios, the security of the contactless operation trigger needs to be considered, and based on the above consideration, the developer sets a security detection scheme before performing the step 202, and the specific processing may be as follows:
first, face detection is performed on image data to obtain face image data in the image data.
In an implementation, after the terminal acquires the image data, the terminal may input the image data into a face detection model, and the face detection model may output the face image data in the image data.
Secondly, a face image data request carrying the account identification of the current login account is sent to the server.
In implementation, after the terminal acquires the face image data in the image data, the terminal may acquire an account identifier of the current login account and generate a face image data request based on the account identifier of the current login account and the face image data in the image data. Then, the face image data request is sent to the server.
Next, reference face image data corresponding to the account identifier sent by the server is received.
In an implementation, after the server receives the face image data request, the server may obtain an account identifier in the face image data request, and obtain reference face image data corresponding to the account identifier according to a correspondence relationship between the account identifier and the reference face image data. Then, the acquired reference face image data is transmitted to the terminal.
Then, it is determined that the face image data matches the reference face image data.
In implementation, the terminal may receive the reference facial image data sent by the server, and then the terminal may input the facial image data and the reference facial image data into the face matching model, and the matching result of the facial image data and the reference facial image data may be obtained through the calculation of the face matching model.
Optionally, after the terminal executes the target operation, the following processing may be performed;
first, when the anchor wants to stop the target function currently being executed, the terminal may acquire sound data collected by the audio collecting device, and determine target reference sound data matching the sound data among the stored reference sound data.
For example, the anchor starts a lottery function in live broadcasting, and after a certain time period, the anchor needs to stop the lottery function. The anchor can say that the sound is made, and then the sound is made, then the terminal can detect the sound of the sound that the microphone gathered, obtain the frequency of the sound of the finger, and then, the terminal can obtain the reference sound data of internal storage, and then compare the frequency of the sound of the finger with the frequency of the reference sound data, if detect the frequency of the reference sound data that matches with the frequency of the sound of the finger, then confirm the reference sound data as the target reference sound data.
And secondly, acquiring image data acquired by the image acquisition equipment.
For example, after the anchor plays a finger, the anchor can shake the head within a preset duration, and complete the "fist making" gesture with the hand, and then the terminal can acquire the image data of the "fist making" gesture which is completed by the user's nod and the hand.
Next, a first motion corresponding to the image data is determined based on the motion detection model.
In implementation, the terminal determines position information of the human body key points in the image data based on the key point extraction model, and then determines a first action corresponding to the image data based on the action detection model and the position information of the human body key points in the image data.
For example, the terminal inputs image data including "shake head" and "fist making" motions into the key point extraction model to obtain hand key point position information and head key point position information in the image data, and then the terminal can input the hand key point position information and the head key point position information into the hand motion detection model and the head motion detection model, respectively, and further can obtain a first hand motion and a first head motion.
Next, a target action and a target function corresponding to the target reference sound data are determined based on the correspondence relationship between the reference sound data and the action and function.
For example, if the terminal stores the correspondence between the reference audio, the head movement, the hand movement, the function name, and the on/off, the terminal may identify the target reference audio corresponding to the target reference sound data after identifying the target reference sound data, and then identify the corresponding target head movement (corresponding to the head movement), target hand movement (corresponding to the hand movement), lottery (corresponding to the function), and off (corresponding to the state).
Then, if the first action is the same as the target action, the target function is executed.
For example, if the first head movement is a shaking motion, the target head movement is a shaking motion, the first hand movement is a "fist making," and the target hand movement is a "fist making," it is verified that the first head movement and the target head movement, the first hand movement and the target hand movement are the same, and the process of stopping the lottery is executed.
This application is through prior evidence sound data and target benchmark sound data whether the same, through verifying the back, detects the image data who gathers and whether there is the target action, if exist, then carries out the target operation that target benchmark sound data correspond, and the user only needs to send corresponding sound like this and puts out corresponding action and just can control and carry out corresponding operation, need not to control through operation mouse and keyboard, has also promoted the operation convenience.
All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
Fig. 5 is a schematic device diagram of a method for triggering contactless operation according to an embodiment of the present application. Referring to fig. 5, the apparatus may be the terminal described above, the apparatus including:
an obtaining module 510, configured to obtain sound data collected by an audio collecting device, and determine target reference sound data matched with the sound data in stored reference sound data;
a determining module 520, configured to determine, in response to determining that target reference sound data matches the sound data, a target action and a target operation corresponding to the target reference sound data based on a correspondence relationship between the reference sound data and the action and operation;
an executing module 530, configured to execute the target operation when it is detected that the target motion exists in the acquired image data based on the motion detection model.
Optionally, the executing module 530 is configured to:
acquiring image data acquired within a preset time period, and determining position information of human key points in the image data based on a key point extraction model;
determining a first action corresponding to the image data based on an action detection model and the position information of the human key points in the image data;
and when the first action is the same as the target action, executing the target operation.
Optionally, the position information of the human body key point includes position information of a head key point, the motion detection model includes a head motion detection model, the first motion includes a first head motion, and the target motion includes a target head motion;
alternatively, the first and second electrodes may be,
the human body key point position information comprises hand key point position information, the action detection model comprises a hand action detection model, the first action comprises a first hand action, and the target action comprises a target hand action.
Optionally, the position information of the human body key points includes position information of hand key points and position information of head key points, the motion detection model includes a hand motion detection model and a head motion detection model, the first motion includes a first hand motion and a first head motion, and the target motion includes a target hand motion and a target head motion;
the determining module 520 is configured to:
and determining a first hand motion corresponding to the image data based on a hand motion detection model and the hand key point position information, and determining a first head motion corresponding to the image data based on a head motion detection model and the head key point position information.
Optionally, the method further includes a matching module, where the matching module is configured to:
carrying out face detection on the acquired image data to obtain face image data in the image data;
sending a face image data request carrying an account identifier of a current login account to a server;
receiving reference face image data corresponding to the account identification sent by the server;
determining that the face image data matches the reference face image data.
Optionally, the target operation is to start a target function or to close the target function.
This application is through prior evidence sound data and target benchmark sound data whether the same, through verifying the back, detects the image data who gathers and whether there is the target action, if exist, then carries out the target operation that target benchmark sound data correspond, and the user only needs to send corresponding sound like this and puts out corresponding action and just can control and carry out corresponding operation, need not to control through operation mouse and keyboard, has also promoted the operation convenience.
It should be noted that: in the contactless operation triggering device provided in the above embodiment, when the switch is controlled, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the embodiments of the method for triggering contactless operations provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the embodiments of the method for triggering contactless operations, which are not described herein again.
Fig. 6 shows a block diagram of a terminal 600 according to an exemplary embodiment of the present application. The terminal may be the terminal in the above embodiment, and the terminal 600 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. The terminal 600 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.
In general, the terminal 600 includes: a processor 601 and a memory 602.
The processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 601 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 601 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
The memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 602 is used to store at least one instruction for execution by processor 601 to implement the method for contactless operation triggering provided by the method embodiments herein.
In some embodiments, the terminal 600 may further optionally include: a peripheral interface 603 and at least one peripheral. The processor 601, memory 602, and peripheral interface 603 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 603 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 604, a touch screen display 605, a camera 606, an audio circuit 607, a positioning component 608, and a power supply 609.
The peripheral interface 603 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 601 and the memory 602. In some embodiments, the processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 601, the memory 602, and the peripheral interface 603 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
The Radio Frequency circuit 604 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 604 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 604 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 604 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 604 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display 605 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 605 is a touch display screen, the display screen 605 also has the ability to capture touch signals on or over the surface of the display screen 605. The touch signal may be input to the processor 601 as a control signal for processing. At this point, the display 605 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 605 may be one, providing the front panel of the terminal 600; in other embodiments, the display 605 may be at least two, respectively disposed on different surfaces of the terminal 600 or in a folded design; in still other embodiments, the display 605 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 600. Even more, the display 605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 605 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.
The camera assembly 606 is used to capture images or video. Optionally, camera assembly 606 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 606 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
Audio circuitry 607 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 601 for processing or inputting the electric signals to the radio frequency circuit 604 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 600. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 601 or the radio frequency circuit 604 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 607 may also include a headphone jack.
The positioning component 608 is used for positioning the current geographic Location of the terminal 600 to implement navigation or LBS (Location Based Service). The Positioning component 608 can be a Positioning component based on the united states GPS (Global Positioning System), the chinese beidou System, the russian graves System, or the european union's galileo System.
Power supply 609 is used to provide power to the various components in terminal 600. The power supply 609 may be ac, dc, disposable or rechargeable. When the power supply 609 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, the terminal 600 also includes one or more sensors 610. The one or more sensors 610 include, but are not limited to: acceleration sensor 611, gyro sensor 612, pressure sensor 613, fingerprint sensor 614, optical sensor 615, and proximity sensor 616.
The acceleration sensor 611 may detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 600. For example, the acceleration sensor 611 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 601 may control the touch screen display 605 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 611. The acceleration sensor 611 may also be used for acquisition of motion data of a game or a user.
The gyro sensor 612 may detect a body direction and a rotation angle of the terminal 600, and the gyro sensor 612 and the acceleration sensor 611 may cooperate to acquire a 3D motion of the user on the terminal 600. The processor 601 may implement the following functions according to the data collected by the gyro sensor 612: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
The pressure sensor 613 may be disposed on a side frame of the terminal 600 and/or on a lower layer of the touch display screen 605. When the pressure sensor 613 is disposed on the side frame of the terminal 600, a user's holding signal of the terminal 600 can be detected, and the processor 601 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 613. When the pressure sensor 613 is disposed at the lower layer of the touch display screen 605, the processor 601 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 605. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The fingerprint sensor 614 is used for collecting a fingerprint of a user, and the processor 601 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 614, or the fingerprint sensor 614 identifies the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 601 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 614 may be disposed on the front, back, or side of the terminal 600. When a physical button or vendor Logo is provided on the terminal 600, the fingerprint sensor 614 may be integrated with the physical button or vendor Logo.
The optical sensor 615 is used to collect the ambient light intensity. In one embodiment, processor 601 may control the display brightness of touch display 605 based on the ambient light intensity collected by optical sensor 615. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 605 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 605 is turned down. In another embodiment, the processor 601 may also dynamically adjust the shooting parameters of the camera assembly 606 according to the ambient light intensity collected by the optical sensor 615.
A proximity sensor 616, also known as a distance sensor, is typically disposed on the front panel of the terminal 600. The proximity sensor 616 is used to collect the distance between the user and the front surface of the terminal 600. In one embodiment, when the proximity sensor 616 detects that the distance between the user and the front surface of the terminal 600 gradually decreases, the processor 601 controls the touch display 605 to switch from the bright screen state to the dark screen state; when the proximity sensor 616 detects that the distance between the user and the front surface of the terminal 600 gradually becomes larger, the processor 601 controls the touch display 605 to switch from the breath screen state to the bright screen state.
Those skilled in the art will appreciate that the configuration shown in fig. 6 is not intended to be limiting of terminal 600 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
In an exemplary embodiment, a computer-readable storage medium, such as a memory, is also provided that includes instructions executable by a processor in a terminal to perform the method for contactless operation triggering in the above-described embodiments. For example, the computer-readable storage medium may be a Read-only Memory (ROM), a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A method of contactless operation triggering, the method comprising:
acquiring sound data acquired by audio acquisition equipment, and determining target reference sound data matched with the sound data in stored reference sound data;
in response to determining target reference sound data matched with the sound data, determining target actions and target operations corresponding to the target reference sound data based on the corresponding relation of the reference sound data, the actions and the operations;
and when the target action is detected in the acquired image data based on the action detection model, executing the target operation.
2. The method of claim 1, wherein performing the target operation when the target motion is detected to exist in the captured image data based on a motion detection model comprises:
acquiring image data acquired within a preset time period, and determining position information of human key points in the image data based on a key point extraction model;
determining a first action corresponding to the image data based on an action detection model and the position information of the human key points in the image data;
and when the first action is the same as the target action, executing the target operation.
3. The method of claim 2, wherein the human body keypoint location information comprises head keypoint location information, the motion detection model comprises a head motion detection model, the first motion comprises a first head motion, and the target motion comprises a target head motion;
alternatively, the first and second electrodes may be,
the human body key point position information comprises hand key point position information, the action detection model comprises a hand action detection model, the first action comprises a first hand action, and the target action comprises a target hand action.
4. The method of claim 2, wherein the body keypoint location information comprises hand keypoint location information and head keypoint location information, the motion detection model comprises a hand motion detection model and a head motion detection model, the first motion comprises a first hand motion and a first head motion, the target motion comprises a target hand motion and a target head motion;
the determining a first action corresponding to the image data based on the action detection model and the position information of the human key points in the image data includes:
and determining a first hand motion corresponding to the image data based on a hand motion detection model and the hand key point position information, and determining a first head motion corresponding to the image data based on a head motion detection model and the head key point position information.
5. The method according to claim 1, wherein before determining the target action and the target operation corresponding to the target reference sound data based on the correspondence between the reference sound data, the action and the operation, the method further comprises:
carrying out face detection on the acquired image data to obtain face image data in the image data;
sending a face image data request carrying an account identifier of a current login account to a server;
receiving reference face image data corresponding to the account identification sent by the server;
determining that the face image data matches the reference face image data.
6. The method of claim 1, wherein the target operation is to start a target function or to shut down a target function.
7. A device for contactless operation triggering, characterized in that the device comprises:
the acquisition module is used for acquiring sound data acquired by the audio acquisition equipment and determining target reference sound data matched with the sound data in the stored reference sound data;
the determining module is used for responding to the target reference sound data matched with the sound data, and determining a target action and a target operation corresponding to the target reference sound data based on the corresponding relation of the reference sound data, the action and the operation;
and the execution module is used for executing the target operation when the target action is detected in the acquired image data based on the action detection model.
8. The apparatus of claim 7, wherein the execution module is configured to:
acquiring image data acquired within a preset time period, and determining position information of human key points in the image data based on a key point extraction model;
determining a first action corresponding to the image data based on an action detection model and the position information of the human key points in the image data;
determining that the target action exists in the image data if the first action is the same as the target action, and determining that the target action does not exist in the image data if the first action is different from the target action.
9. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction that is loaded and executed by the processor to perform operations performed by a method of contactless operations triggering according to any of claims 1 to 6.
10. A computer-readable storage medium having stored therein at least one instruction which is loaded and executed by a processor to perform operations performed by a method for contactless operations triggering according to any one of claims 1 to 6.
CN202010886923.7A 2020-08-28 2020-08-28 Method, device, equipment and storage medium for triggering non-contact operation Pending CN111986700A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010886923.7A CN111986700A (en) 2020-08-28 2020-08-28 Method, device, equipment and storage medium for triggering non-contact operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010886923.7A CN111986700A (en) 2020-08-28 2020-08-28 Method, device, equipment and storage medium for triggering non-contact operation

Publications (1)

Publication Number Publication Date
CN111986700A true CN111986700A (en) 2020-11-24

Family

ID=73440905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010886923.7A Pending CN111986700A (en) 2020-08-28 2020-08-28 Method, device, equipment and storage medium for triggering non-contact operation

Country Status (1)

Country Link
CN (1) CN111986700A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114697686A (en) * 2020-12-25 2022-07-01 北京达佳互联信息技术有限公司 Online interaction method and device, server and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170034579A1 (en) * 2015-07-31 2017-02-02 John Caudell Method for Sound Recognition Task Trigger
CN106791921A (en) * 2016-12-09 2017-05-31 北京小米移动软件有限公司 The processing method and processing device of net cast
CN107124664A (en) * 2017-05-25 2017-09-01 百度在线网络技术(北京)有限公司 Exchange method and device applied to net cast
CN108076392A (en) * 2017-03-31 2018-05-25 北京市商汤科技开发有限公司 Living broadcast interactive method, apparatus and electronic equipment
CN110446115A (en) * 2019-07-22 2019-11-12 腾讯科技(深圳)有限公司 Living broadcast interactive method, apparatus, electronic equipment and storage medium
CN110881134A (en) * 2019-11-01 2020-03-13 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium
CN111274910A (en) * 2020-01-16 2020-06-12 腾讯科技(深圳)有限公司 Scene interaction method and device and electronic equipment
CN111353805A (en) * 2018-12-24 2020-06-30 阿里巴巴集团控股有限公司 Lottery drawing processing method and device in live broadcast and electronic equipment
CN111382624A (en) * 2018-12-28 2020-07-07 杭州海康威视数字技术股份有限公司 Action recognition method, device, equipment and readable storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170034579A1 (en) * 2015-07-31 2017-02-02 John Caudell Method for Sound Recognition Task Trigger
CN106791921A (en) * 2016-12-09 2017-05-31 北京小米移动软件有限公司 The processing method and processing device of net cast
CN108076392A (en) * 2017-03-31 2018-05-25 北京市商汤科技开发有限公司 Living broadcast interactive method, apparatus and electronic equipment
CN107124664A (en) * 2017-05-25 2017-09-01 百度在线网络技术(北京)有限公司 Exchange method and device applied to net cast
CN111353805A (en) * 2018-12-24 2020-06-30 阿里巴巴集团控股有限公司 Lottery drawing processing method and device in live broadcast and electronic equipment
CN111382624A (en) * 2018-12-28 2020-07-07 杭州海康威视数字技术股份有限公司 Action recognition method, device, equipment and readable storage medium
CN110446115A (en) * 2019-07-22 2019-11-12 腾讯科技(深圳)有限公司 Living broadcast interactive method, apparatus, electronic equipment and storage medium
CN110881134A (en) * 2019-11-01 2020-03-13 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium
CN111274910A (en) * 2020-01-16 2020-06-12 腾讯科技(深圳)有限公司 Scene interaction method and device and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114697686A (en) * 2020-12-25 2022-07-01 北京达佳互联信息技术有限公司 Online interaction method and device, server and storage medium
CN114697686B (en) * 2020-12-25 2023-11-21 北京达佳互联信息技术有限公司 Online interaction method and device, server and storage medium

Similar Documents

Publication Publication Date Title
CN110971930B (en) Live virtual image broadcasting method, device, terminal and storage medium
CN108833818B (en) Video recording method, device, terminal and storage medium
CN111147878B (en) Stream pushing method and device in live broadcast and computer storage medium
CN109348247B (en) Method and device for determining audio and video playing time stamp and storage medium
CN109246123B (en) Media stream acquisition method and device
CN110278464B (en) Method and device for displaying list
CN108762881B (en) Interface drawing method and device, terminal and storage medium
CN111107389B (en) Method, device and system for determining live broadcast watching time length
CN109068008B (en) Ringtone setting method, device, terminal and storage medium
CN110533585B (en) Image face changing method, device, system, equipment and storage medium
CN110418152B (en) Method and device for carrying out live broadcast prompt
CN110677713B (en) Video image processing method and device and storage medium
CN109783176B (en) Page switching method and device
CN109218169B (en) Instant messaging method, device and storage medium
CN108401194B (en) Time stamp determination method, apparatus and computer-readable storage medium
CN111061369B (en) Interaction method, device, equipment and storage medium
CN112118482A (en) Audio file playing method and device, terminal and storage medium
CN108228052B (en) Method and device for triggering operation of interface component, storage medium and terminal
CN111986700A (en) Method, device, equipment and storage medium for triggering non-contact operation
CN111464829B (en) Method, device and equipment for switching media data and storage medium
CN114595019A (en) Theme setting method, device and equipment of application program and storage medium
CN109819308B (en) Virtual resource acquisition method, device, terminal, server and storage medium
CN112015612B (en) Method and device for acquiring stuck information
CN113485596A (en) Virtual model processing method and device, electronic equipment and storage medium
CN108881715B (en) Starting method and device of shooting mode, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination