CN116149477A

CN116149477A - Interaction method, interaction device, electronic equipment and storage medium

Info

Publication number: CN116149477A
Application number: CN202310126676.4A
Authority: CN
Inventors: 林澜波; 洪德祥; 曹立; 杨佳霖; 邵杰
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2023-02-01
Filing date: 2023-02-01
Publication date: 2023-05-23

Abstract

The disclosure relates to an interaction method, an interaction device, electronic equipment and a storage medium, and relates to the technical fields of computer vision and deep learning. The method comprises the following steps: responding to the received trigger instruction, and generating canvas; acquiring a hand image of a first object in real time, and determining an interactive gesture; acquiring the position of a key point of a hand of a first object in a canvas; modifying the canvas according to the interaction gesture and the position; the modified canvas is presented to the second object. By the application of the method and the device, the user can be allowed to display the special text through the canvas in the video session process, and interaction efficiency is improved.

Description

Interaction method, interaction device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to the field of computer vision, and in particular, to an interaction method, an interaction device, an electronic device, and a storage medium.

Background

In recent years, gesture recognition technology based on computer vision has been rapidly developed. The technology can realize the recognition of human body gestures by only using video capturing equipment such as cameras and the like and then using technologies such as machine learning, deep learning and the like without using wearable equipment and external sensors. Gesture recognition technology can be divided into static gesture recognition and dynamic gesture recognition (including two-dimensional dynamic gesture recognition and three-dimensional dynamic gesture recognition), wherein the static gesture recognition and the two-dimensional dynamic gesture recognition can be realized by only a two-dimensional camera system, and have wide application.

Video conferencing is now an indispensable interaction in people's work and life. People may have the demonstration requirement of special texts such as drawing symbols, graphics and the like during a video conference, and the existing video conference interaction mode is difficult to efficiently and directly transfer the information.

Disclosure of Invention

The embodiment of the disclosure provides an interaction method, an interaction device, electronic equipment and a storage medium.

In a first aspect, embodiments of the present disclosure provide an interaction method, including: responding to the received trigger instruction, and generating canvas; acquiring a hand image of a first object in real time, and determining an interactive gesture; acquiring the position of a key point of a hand of a first object in a canvas; modifying the canvas according to the interaction gesture and the position; the modified canvas is presented to the second object.

In a second aspect, embodiments of the present disclosure provide an interaction apparatus, comprising: the canvas generating unit is configured to respond to the received trigger instruction and generate a canvas; the gesture determining unit is configured to acquire a hand image of the first object in real time and determine an interaction gesture; a position acquisition unit configured to acquire a position of a target key point of a hand of a first object in a canvas; the canvas modifying unit is configured to modify the canvas according to the interaction gesture and the position; and a canvas presentation unit configured to present the modified canvas to the second object.

In a third aspect, embodiments of the present disclosure provide an electronic device comprising a memory, a processor, a bus, and a computer program stored on the memory and executable on the processor, the processor implementing the interaction method as described in the first aspect when executing the computer program.

In a fourth aspect, embodiments of the present disclosure provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the interaction method as described in the first aspect.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram to which one embodiment of the interaction method of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of an interaction method of the present disclosure;

FIG. 3 is a schematic diagram of an application scenario of the interaction method of the present disclosure;

FIG. 4 is a schematic diagram of an interactive interface of a video conferencing application;

FIG. 5 is a schematic diagram of gesture 1;

FIG. 6 is a flow diagram of another embodiment of an interaction method of the present disclosure;

FIG. 7 is a schematic diagram of key points of a hand;

FIGS. 8 a-8 e are schematic diagrams of five preset 5 gestures;

FIG. 9 is a schematic diagram of another interactive interface of a video conferencing application;

FIG. 10 is a schematic structural view of one embodiment of an interaction device of the present disclosure;

fig. 11 is a schematic structural view of an embodiment of an electronic device of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments in accordance with the present disclosure. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. In order to make the technical scheme and advantages of the present disclosure more apparent, the present disclosure will be further described in detail below with reference to the accompanying drawings and specific embodiments.

FIG. 1 illustrates an exemplary system architecture 100 in which embodiments of the interaction methods or interaction devices of the present disclosure may be applied.

As shown in fig. 1, a system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the

terminal devices

101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a video conference type application, etc., may be installed on the

terminal devices

101, 102, 103. The

terminal devices

101, 102, 103 may also be equipped with microphone arrays, image pickup devices, speakers, and the like.

The

terminal devices

101, 102, 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smartphones, tablets, car-mounted computers, laptop and desktop computers, and the like. When the

terminal devices

101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., to provide distributed services), or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for video conferencing-type applications installed on the

terminal devices

101, 102, 103. The background server may receive instructions from the user and feed back the instructions during the video session performed by the user through the respective

terminal devices

101, 102, 103.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster formed by a plurality of servers, or as a single server. When server 105 is software, it may be implemented as a plurality of software or software modules (e.g., to provide distributed services), or as a single software or software module. The present invention is not particularly limited herein.

It should be noted that the interaction method provided by the embodiments of the present disclosure is generally performed by the server 105. Accordingly, the interaction means is typically provided in the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

FIG. 2 illustrates a flow 200 of one embodiment of an interaction method of the present disclosure. As shown in fig. 2, the interaction method of the present embodiment may include the following steps:

in step 201, a canvas is generated in response to a received trigger instruction.

In this embodiment, the execution subject of the interaction method (e.g., the server 105 shown in fig. 1) may provide the video session service for each terminal device (e.g., the

terminal devices

101, 102, 103 shown in fig. 1). The user using each terminal device can participate in video session as a session party through the video conference application installed on the terminal device for video communication. Each party participating in the video session may send a trigger instruction to the executing body. Specifically, each session party may implement sending the triggering instruction by clicking a button in the video conference application or inputting an instruction through an input device connected to the terminal device (for example, inputting a shortcut key through a keyboard).

After receiving the trigger instruction, the execution body may generate a canvas. Here, the execution body may generate the initial canvas by controlling the user terminal that transmits the trigger instruction, or may control each of the session parties participating in the video session to generate the initial canvas. The size and location of the initial canvas may be pre-set. It should be noted that, in order to avoid the generated initial canvas from occluding the original video frame, the initial canvas may be semi-transparent. Or, each session party can adjust parameters such as color, transparency, position, size and the like of the initial canvas according to personalized requirements. In this embodiment, the canvas may be used by the user to present some special text.

Step 202, acquiring a hand image of a first object in real time, and determining an interactive gesture.

The executing body may also acquire the hand image of the first object through an image acquisition device in the terminal used by each conversation party. Specifically, the execution subject may first determine the hand image in various ways, for example, taking the entire video screen of the sender as the hand image, or taking a specified area in the above-described video screen as the hand image. The executing subject may also perform gesture recognition on the hand image using various deep learning algorithms (e.g., a single step multi-frame object detection model). Here, the first object may be either a party or a sender of the trigger instruction. The executing body can conduct gesture recognition on the hand image acquired in real time, and the interactive gestures are determined. Specifically, the executing body may perform key point recognition on the hand image, and determine a plurality of key points of the hand. And determining the interaction gesture according to the relative positions of the key points.

Step 203, acquiring the position of the key point of the hand of the first object in the canvas.

Simultaneously with or after generating the canvas, the execution body may determine a location of a keypoint of the hand of the first object in the canvas. The canvas may be an initial canvas or a modified canvas. Specifically, the execution subject may first take a key point at a preset position of the hand of the first object as a target key point. For example, the executing body may take the position of the tip of the index finger as the target key point. Alternatively, the executive body may take the position of the palm as a target key point.

When determining the location of the target keypoint in the canvas, the location of the target keypoint in the video image may be first determined, and then the location of the target keypoint in the initial canvas may be determined according to the relative positional relationship between the center of the initial canvas and the center of the video image. Specifically, the executing body may determine a transformation matrix between the coordinate system in which the video image is located and the coordinate system in which the initial canvas is located. And multiplying the coordinates of the target key points in the video image with the conversion matrix, wherein the obtained coordinates are used as the coordinates of the target key points in a canvas coordinate system.

It will be appreciated that the executing body may process each frame of video image to determine the location of the target keypoint in the canvas for each frame of video image. In some specific applications, in order to increase the processing speed, the frame may be skipped for processing.

Or, the execution body may further receive a coordinate sequence sent by the user terminal, and the execution body may directly use the coordinate sequence as a position of a hand key point of the first object in the canvas. The above-mentioned coordinate sequence may be obtained by an input device to which the terminal device used by the first object is connected. For example, the first object may be input through a touch screen of the terminal (e.g., through a screen of a tablet, a notebook) or a tablet, resulting in a coordinate sequence.

And step 204, modifying the canvas according to the interactive gestures and the positions.

In this embodiment, different interaction gestures may correspond to different interaction modes. Here, the interaction pattern may include, but is not limited to: selection mode (for selecting a function in a function area), writing mode (which may write on a canvas), erasing mode (which may erase traces on the canvas), recovery mode (which may recover to a specified history canvas), canvas pan mode (which may pan the canvas), recognition mode (text recognition or pattern recognition of traces on the canvas). In different interaction modes, traces formed at a plurality of different positions in succession have different meanings. For example, in a selection mode, the user may effect selection by clicking a button of a functional area through an external input device (such as a mouse or touch screen), or may effect selection by recognizing the position of the tip of an index finger under a specified gesture. In the writing mode, the trace may be displayed on a canvas. In the erase mode, the trace is erased.

The execution body may first determine a corresponding interaction pattern from the interaction gesture, and then make corresponding modifications to the canvas in this interaction pattern according to the determined plurality of positions. Such modifications may include, but are not limited to: display handwriting, erase handwriting, move canvas, and so forth.

Step 205, the modified canvas is presented to the second object.

The execution body may expose the modified canvas to the second object in real time. Here, the second object may include the first object and may further include the remaining parties participating in the video session. Alternatively, the second object may be an object specified by the first object.

With continued reference to fig. 3, a schematic diagram of one application scenario of the interaction method according to the present disclosure is shown. In fig. 3, each

user terminal

301, 302, 303 participates in a video conference through a video conference application, wherein one of the parties generates a trigger instruction by clicking a "gesture interaction" button (as shown in fig. 4) of a menu bar in the video conference application interface. After receiving the trigger instruction, the server 304 may control the

user terminals

301, 302, 303 to generate a canvas. The camera of the terminal collects the hand image of the sender of the trigger instruction and sends the hand image to the server 304. The server 304 determines that the interaction gesture is gesture 1 (as shown in fig. 5) by processing the hand image. Then, the server 304 obtains the position of the key point of the hand of the first object in the canvas by analyzing the position of the index finger tip in the hand image. Finally, the server 304 determines that the interaction mode corresponding to the gesture 1 is a writing mode, and writes the trace corresponding to the coordinate sequence on the canvas. And presents a canvas containing the above-mentioned trace to each

user terminal

301, 302, 303.

The interaction method provided by the embodiment of the invention can allow the user to call the canvas in the video session process and display the special text on the canvas through gestures, thereby improving the interaction efficiency.

With continued reference to fig. 6, a flow 600 of another embodiment of an interaction method according to the present disclosure is shown. As shown in fig. 6, the method of the present embodiment may include the steps of:

in step 601, a canvas is generated in response to the received trigger instruction.

In some alternative implementations of the present embodiment, the generated canvas may include a plurality of regions including, but not limited to: visual area, buffer area, functional area. Here, the visible region refers to a region for identifying material on the canvas; the buffer area is an area for buffering materials on the canvas; the function region refers to a region for providing an entry for functional operations on the canvas.

In the initial canvas, the viewable area may be centered so that a user may conveniently write handwriting therein. The buffer area is positioned outside the visual area, so that a user can drag the canvas to obtain a new writable area after the canvas in the visual area is fully written. In some specific practices, the size of the buffer may be N times the size of the viewable area. N may be an integer (e.g., may be 8). The functional area can be positioned at one side of the visual area, so that a user can conveniently call buttons and the like of the functional area to modify the canvas.

Step 602, obtaining a video image of a first object in real time; determining a hand image of the first object based on the video image; detecting hand key points of the hand image to obtain a plurality of hand key points; and determining the interaction gesture according to the relative positions among the plurality of hand key points.

In this embodiment, if the executing body receives the trigger instruction sent by any session party in the video session process, the executing body may first use the sender of the trigger instruction as the first object. Then, a video image of the first object may be acquired in real time.

The executing body may further analyze the video image to determine a hand image of the first subject. Specifically, the execution subject may take a designated area in the video image as the hand image, or the execution subject may first determine a palm area of the first object and then further determine the hand image from the palm area. In particular, the execution subject may utilize a single-step multi-frame object detection model to process video images. During this process, the subject is performed so that the detection target is designated as the palm instead of the entire hand area, for two reasons: firstly, as the finger mobility is higher, compared with the detection of the whole hand, the shape of the palm is relatively fixed, and the detection is simpler; in addition, in most cases, the palm area is approximately square, and with this a priori information, the number of anchor (anchor) boxes can be significantly reduced. During the training phase of the single-step multi-frame object detection model, the single-step multi-frame object detection model produces a large number of anchor frames, wherein only a small portion of the anchor frames containing the palm are assigned to "positive labels", which results in a large number of "negative labels". In order to balance the number of positive labels and negative labels of the anchor frame, the loss function of the single-step multi-frame target detection model can be adjusted from an original cross entropy loss function to a cross entropy loss function value added with category weights and template difficulty weight adjustment factors. After determining the palm area, the executing subject may further determine the hand image in a number of ways. For example, the execution subject may expand the upper edge of the palm area upward by a preset size, and the resulting image is used as the hand image. Alternatively, the execution subject may enlarge the image obtained by 2 times outward with the center of the palm area as the hand image.

After the hand image is obtained, the executing body can detect the hand key points of the hand image to obtain a plurality of hand key points. Specifically, the executing body may input each hand image into a pre-trained keypoint detection model, to obtain a plurality of keypoint coordinates. The positions of the key points are shown in fig. 7, and each key point represents a joint of the hand.

According to the relative positions of the key points, the executing body can determine the interaction gesture. Specifically, the executing body may first determine the key point of each finger. The angle of each finger is then calculated. After the angle of each finger is obtained, the opening and closing state of each finger can be judged through the angle relation. The executing body may predefine several gestures by which the user may interact with the executing body. Schematic diagrams of the respective gestures are shown in fig. 8a to 8 e.

The gesture shown in fig. 8a is identified by the following identification method: the thumb, ring finger and little finger are closed, and the index finger and middle finger are open. The gesture shown in fig. 8b is recognized by the following recognition method: the thumb and index finger are closed, and the middle finger, the ring finger and the little finger are opened. The gesture shown in fig. 8c is recognized by the following recognition method: five fingers are closed. The gesture shown in fig. 8d is recognized by the following recognition method: the middle and ring fingers are closed and the thumb, index finger and little finger are open. The gesture shown in fig. 8e is recognized by the following recognition method: the thumb and index finger open and the middle finger, ring finger and little finger close.

In step 603, in response to determining that the interaction gesture belongs to a preset interaction gesture set, it is determined that the interaction gesture meets a preset condition.

After determining the interaction gesture, the executing body may determine that the interaction gesture belongs to a preset interaction gesture set. Here, the preset set of interaction gestures may include the gestures shown in fig. 8a to 8 e. And if the interaction gesture belongs to the interaction gesture set, determining that the interaction gesture meets a preset condition.

Step 604, obtaining the position of the key point of the preset hand in the canvas.

Step 605, determining a position of a key point of a hand of the first object in the canvas according to a coordinate sequence received from the external device.

In this embodiment, the execution body may determine the location of the hand keypoints of the first object in the initial canvas through step 604 and/or step 605. In step 604, the executing body may first select 21 keypoints of the hand, which represent the keypoints of the index finger tip, as target keypoints, i.e., the keypoints with the sequence number of 8 as target keypoints. The location of the target keypoints in the canvas may then be further determined. In step 605, the executing body may obtain a coordinate sequence sent by the external device as a position of a key point of the hand of the first object in the canvas.

Step 606, determining a corresponding interaction mode according to the interaction gesture; determining display materials according to the interaction mode and the determined position; and displaying the display material on a canvas.

In this embodiment, different interaction gestures may correspond to different interaction modes, and the operations of the fingers in the different interaction modes have different meanings. The execution subject may first determine an interaction pattern corresponding to the interaction gesture. And simultaneously determining the operation corresponding to the position of the target key point in the canvas. And then searching an operation instruction corresponding to the operation in the interaction mode. And then modifying the canvas according to the operation instruction to determine the presentation materials on the canvas. And finally, displaying the display materials on a canvas. For example, in pan mode, the operation instruction corresponding to the operation of sliding right is to move the canvas right. The execution body can move the canvas to the right according to the operation, and the content displayed by the moved canvas in the visual area is used as the display material. In the writing mode, the operation instruction corresponding to the operation of sliding rightward is writing a line segment to the right. The execution body may form a line segment on the canvas according to the above operation, and display the line segment as the display material.

In some optional implementations of this embodiment, if the interaction mode is a writing mode, the executing body may connect the determined positions to obtain the initial trajectory. Specifically, the execution body may sequentially connect the positions according to the time stamp corresponding to each position, to obtain an initial track. The execution main body can also adjust the initial track, and the target track obtained after adjustment is used as the display material. It should be noted that, in order to enhance the look and feel of each conversation party, the execution subject may adjust the initial trajectory synchronously.

The adjustment here may include smoothing the initial trajectory, removing outliers, and so on. For example, a curve similar to a straight line drawn by a user through a gesture may be smoothed into a straight line, a "burr" on a smoothed curve drawn by the user may be removed, and so on.

In some alternative implementations of the present embodiment, the adjusting may further include: performing shape recognition on the initial track to obtain a corresponding target shape; and determining the parameters of the target shape, and adjusting the initial track according to the parameters to obtain the target track.

In this implementation manner, the execution body may further perform shape recognition on the initial track to obtain a corresponding target shape. Specifically, the execution body may connect the start point and the end point of the initial trajectory, and determine the maximum connected domain. And comparing the shape of the maximum connected domain with a standard shape template, and taking the shape template with the maximum matched pair as a target shape. Alternatively, the execution body may detect whether or not each target shape exists by various algorithms corresponding to the target shape, for example, detecting a circle that may exist using hough circle transformation.

Then, the execution body may determine a parameter of the target shape according to the positions and the number of pixels in the maximum connected domain. If the target shape is a circle, the radius and center of the circle can be estimated. If the target shape is square, the sides and center of the square can be estimated.

After the parameters are determined, the graphics of the parameters can be directly drawn on the canvas to replace the graphics drawn by a human hand. Alternatively, the difference between the figure drawn by the human hand and the standard figure corresponding to the parameter may be compared, and the part where the difference is larger than the threshold may be adjusted.

In some optional implementations of this embodiment, when the interaction mode is the recognition mode, the execution body performs text recognition on the target track, and uses the obtained text recognition result as the presentation material; and displaying the display material on the canvas.

In this embodiment, the execution body may further implement text recognition, that is, perform text recognition on the target track, to obtain a text recognition result. In particular, the executing body may implement text recognition using a variety of text recognition algorithms. It will be appreciated that if a particular shape is included in the target track, the track of the target shape will not be identified at the time of text recognition.

The executing body may output the text recognition results so that the parties to the conversation may be seen on the canvas. Further, each party may also revise the text recognition results, including but not limited to adding, deleting. The executive body may record specific information of each conversation partner revision, including reviser, revision time, revision content, etc.

Alternatively, the execution body may also perform note sorting according to the text recognition result. Specifically, the recognized text and the special text can be typeset according to the writing position of the writer, and then the content on the canvas is saved as a file of a specified type.

In this embodiment, since the text recognition takes a long time, it may be performed in an asynchronous call manner, so as to ensure that operations are smooth in the video session process.

In order to facilitate viewing of the target track by the parties participating in the video conference, an identification area (as shown in fig. 9) may be defined in the video conference interface for displaying the target track and the identified text.

In addition, in order to facilitate each session party to exit the interaction scene described in the embodiment, a video playback area (as shown in fig. 9) may be defined in the video conference interface, where the video playback area keeps displaying the video picture of each session party in the video conference process. Each conversation party can return to the video picture by clicking the video back display area, and the canvas is hidden.

In some alternative implementations of the present embodiment, the execution body may store the target trajectory and the text recognition result.

In some alternative implementations of the present embodiment, the execution body may also receive a call instruction for the stored historical target trajectory and/or historical text recognition results. The execution body may analyze the call instruction to determine time information, text information, symbol information, and the like contained therein. The execution body may compare the time information with the stored time of the historical target track and/or the historical text recognition result, and use the matched historical target track and/or the historical text recognition result as the call result. Alternatively, the execution body may search for text in the historical text result, and if text information in the call instruction is included, the corresponding historical text result is used as the call result

In some optional implementations of this embodiment, when the interaction mode is a resume mode, the execution body may further receive a resume instruction sent by the first object. Here, the resume instruction refers to a state before the initial trajectory is restored to N frames. In some specific applications, the user may restore the initial trajectory to a state 10 frames ago each time the user triggers a restore instruction.

The interaction method provided by the embodiment of the application can allow the user to display some special texts through the canvas in the video session process, so that the interaction efficiency is improved, and the interaction experience is improved. And the special text displayed on the canvas by the user can be further processed, so that the special text is more standard, the special text can be more easily acquired by other conversational parties, and the interaction efficiency of the video conference is improved.

In addition, in the technical scheme of the application, an interaction device is also provided.

As shown in fig. 10, an interaction device 1000 in a specific embodiment of the disclosure includes: a canvas generation unit 1001, a gesture determination unit 1002, a position acquisition unit 1003, a canvas modification unit 1004, and a canvas presentation unit 1005.

The canvas generation unit 1001 is configured to generate a canvas in response to the received trigger instruction.

The gesture determining unit 1002 is configured to acquire a hand image of the first object in real time, and determine an interaction gesture.

The position acquisition unit 1003 is configured to acquire a position of a target key point of a hand of the first object in the canvas.

And a canvas modifying unit 1004 configured to modify the canvas according to the interactive gesture and the position.

A canvas presentation unit 1005 configured to present the modified canvas to the second object.

In addition, in the technical scheme of the application, an electronic device is also provided.

Fig. 11 shows a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

As shown in fig. 11, the electronic device may include a processor 1101, a memory 1102, a bus 1103, and a computer program stored on the memory 1102 and executable on the processor 1101, wherein the processor 1101 and the memory 1102 perform communication with each other via the bus 1103. The steps of implementing the above method when the processor 1101 executes a computer program include, for example: responding to the received trigger instruction, and generating canvas; acquiring a hand image of a first object in real time, and determining an interactive gesture; acquiring the position of a key point of a hand of a first object in a canvas; modifying the canvas according to the interaction gesture and the position; the modified canvas is presented to the second object.

In addition, in one embodiment of the present disclosure, there is also provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above method, for example, including: responding to the received trigger instruction, and generating canvas; acquiring a hand image of a first object in real time, and determining an interactive gesture; acquiring the position of a key point of a hand of a first object in a canvas; modifying the canvas according to the interaction gesture and the position; the modified canvas is presented to the second object.

In summary, in the technical scheme of the disclosure, the Xu Huihua party may be allowed to call the trigger instruction in the video conference process to generate the canvas. And the canvas is modified according to the gestures and the positions, so that special symbols can be displayed on the canvas, and the efficiency of video interaction can be improved.

The foregoing description of the preferred embodiments of the present disclosure is not intended to limit the disclosure, but rather to cover all modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present disclosure.

Claims

1. An interaction method, comprising:

responding to the received trigger instruction, and generating canvas;

acquiring a hand image of a first object in real time, and determining an interactive gesture;

acquiring the position of a key point of the hand of the first object in the canvas;

modifying the canvas according to the interaction gesture and the position;

the modified canvas is presented to the second object.

2. The method of claim 1, wherein the acquiring, in real-time, the hand image of the first object, the determining the interaction gesture comprises:

acquiring a video image of the first object in real time;

determining a hand image of the first object based on the video image;

detecting hand key points of the hand image to obtain a plurality of hand key points;

and determining the interaction gesture according to the relative positions among the plurality of hand key points.

3. The method of claim 1, wherein the canvas comprises: the visual area is positioned at the center of the initial canvas, the cache area is positioned outside the visual area, and the functional area is positioned at one side of the visual area.

4. The method of claim 1, wherein the obtaining the location of the keypoints of the hand of the first object in the canvas comprises:

acquiring the positions of key points of a preset hand in the canvas; or alternatively

And determining the position of the key point of the hand of the first object in the canvas according to the coordinate sequence received from the external device.

5. The method of claim 1, wherein the modifying the canvas according to the interaction gesture and the location comprises:

determining a corresponding interaction mode according to the interaction gesture, wherein the interaction mode comprises: selecting a mode, a writing mode, an erasing mode, a recovery mode, a canvas panning mode and an identification mode;

determining display materials according to the interaction mode and the position;

and displaying the display material on the canvas.

6. The method of claim 5, wherein the determining presentation materials from the interaction pattern and the location comprises:

connecting the determined positions to obtain an initial track;

and adjusting the initial track, and taking the adjusted target track as a display material.

7. The method of claim 6, wherein the adjusting the initial trajectory comprises at least one of:

smoothing the initial track;

and carrying out shape recognition on the initial track, and adjusting the position of the initial track according to parameters corresponding to the shape recognition result.

8. The method of claim 5, wherein when the interaction mode is a recognition mode, text recognition is performed on the target track, and the obtained text recognition result is used as a display material; and

and displaying the display material on the canvas.

9. An interaction device, comprising:

the canvas generating unit is configured to respond to the received trigger instruction and generate a canvas;

the gesture determining unit is configured to acquire a hand image of the first object in real time and determine an interaction gesture;

a position acquisition unit configured to acquire a position of a target key point of a hand of the first object in the canvas;

a canvas modification unit configured to modify the canvas according to the interaction gesture and the position;

and a canvas presentation unit configured to present the modified canvas to the second object.

10. An electronic device comprising a memory, a processor, a bus and a computer program stored on the memory and executable on the processor, wherein the processor implements the interaction method of any of claims 1 to 8 when the computer program is executed by the processor.

11. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the interaction method according to any of claims 1 to 8.