WO2020228643A1 - 交互控制方法、装置、电子设备及存储介质 - Google Patents

交互控制方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2020228643A1
WO2020228643A1 PCT/CN2020/089448 CN2020089448W WO2020228643A1 WO 2020228643 A1 WO2020228643 A1 WO 2020228643A1 CN 2020089448 W CN2020089448 W CN 2020089448W WO 2020228643 A1 WO2020228643 A1 WO 2020228643A1
Authority
WO
WIPO (PCT)
Prior art keywords
preset
key points
coordinates
dimensional coordinates
screen space
Prior art date
Application number
PCT/CN2020/089448
Other languages
English (en)
French (fr)
Inventor
卓世杰
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to EP20805347.0A priority Critical patent/EP3971685A4/en
Publication of WO2020228643A1 publication Critical patent/WO2020228643A1/zh
Priority to US17/523,265 priority patent/US20220066545A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/0304Detection arrangements using opto-electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/11Hand-related biometrics; Hand pose recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present disclosure relates to the field of computer technology, and in particular, to an interactive control method, an interactive control device, electronic equipment, and a computer-readable storage medium.
  • the position information of the hand obtained by hand tracking is only the two-dimensional coordinates on the screen space.
  • the step of estimating two-dimensional coordinates into three-dimensional coordinates may have relatively large errors, making the estimated three-dimensional coordinates inaccurate, resulting in inaccurate interaction.
  • the process of estimating the three-dimensional coordinates may result in low operation efficiency and affect the interactive experience.
  • the purpose of the present disclosure is to provide an interactive control method, device, electronic device, and computer-readable storage medium, so as to at least to some extent overcome the problem of inability to achieve precise interaction due to limitations and defects of related technologies.
  • an interactive control method which includes: obtaining screen space coordinates of key points of a preset part, and obtaining the true distance of the key point of the preset part relative to the shooting device; The distance is combined with the screen space coordinates to determine the three-dimensional coordinates of the key points of the preset part in the virtual world; the key points of the preset part and the virtual objects in the virtual world are determined according to the three-dimensional coordinates. And controlling the key points of the preset part to interact with the virtual object based on the spatial relationship.
  • obtaining the screen space coordinates of the key points of the preset part includes: obtaining a first image collected by a monocular camera and containing the preset part; and keying the first image Point detection to obtain the screen space coordinates of the key points of the preset position.
  • performing key point detection on the first image to obtain the screen space coordinates of the key point of the preset part includes: using a trained convolutional neural network model Process the first image to obtain the key points of the preset location; perform regression processing on the key points of the preset location to obtain the location information of the key points of the preset location and to obtain the location information As the screen space coordinates.
  • the photographing device includes a depth camera
  • acquiring the true distance of the key point of the preset part with respect to the photographing device includes: acquiring information collected by the depth camera including the preset The second image of the part; align the first image and the second image; take the screen space coordinates on the aligned second image to obtain the key points of the preset part The true distance to the depth camera.
  • combining the real distance with the screen space coordinates to determine the three-dimensional coordinates in the virtual world of the key points of the preset position includes: according to the real distance and The screen space coordinates obtain the three-dimensional coordinates of the projection space of the key points of the preset location; determine the projection matrix according to the angle of view of the photographing device; and convert the three-dimensional coordinates of the projection space into all based on the projection matrix Describe the three-dimensional coordinates in the virtual world.
  • the spatial relationship between the key points of the preset location and the virtual object in the virtual world is determined according to the three-dimensional coordinates, and the preset location is controlled based on the spatial relationship
  • the key points of interacting with the virtual object include: obtaining three-dimensional coordinates in the virtual world of key points of a preset position for interacting with the virtual object; calculating the difference between the three-dimensional coordinates and the coordinates of the virtual object If the distance meets the preset distance, the key point of the preset position is triggered to interact with the virtual object.
  • triggering the interaction between the key points of the preset part and the virtual object includes: identifying the current action of the key point of the preset part; Performing matching on a predetermined action, and interacting with the virtual object in response to the current action according to the matching result; wherein the multiple predetermined actions correspond to the interactive operations one-to-one.
  • an interactive control device including: a parameter acquisition module, configured to acquire screen space coordinates of key points of a preset part, and to acquire the true value of the key points of the preset part relative to the shooting device Distance; a three-dimensional coordinate calculation module for combining the real distance with the screen space coordinates to determine the three-dimensional coordinates in the virtual world of the key points of the preset position; an interactive execution module for determining the three-dimensional coordinates according to the three-dimensional The coordinates determine the spatial relationship between the key points of the preset location and the virtual object in the virtual world, and control the key points of the preset location to interact with the virtual object based on the spatial relationship.
  • an electronic device including: a processor; and
  • the memory is configured to store executable instructions of the processor; wherein the processor is configured to execute the interactive control method described in any one of the foregoing by executing the executable instructions.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the interactive control method described in any one of the above is implemented.
  • the screen space coordinates of the key points of the preset position and the real distance to the shooting device are combined to obtain the
  • the three-dimensional coordinates of the key point in the virtual world avoids the step of estimating the three-dimensional coordinates, reduces the error caused by the estimating step, improves the accuracy, can obtain accurate three-dimensional coordinates, and realize precise interaction based on the three-dimensional coordinates.
  • the three-dimensional coordinates of the key points of the preset part can be obtained by combining the screen space coordinates and the real distance, there is no need to estimate the coordinates, which improves the calculation efficiency, and can quickly get the key points of the preset part to be accurate in the virtual world.
  • the three-dimensional coordinates because the three-dimensional coordinates.
  • the spatial relationship between the key points of the preset location determined according to the three-dimensional coordinates and the virtual objects in the virtual world can accurately control the interaction of the key points of the preset location with the virtual objects through the spatial relationship, thereby improving user experience.
  • Fig. 1 schematically shows a schematic diagram of an interactive control method in an exemplary embodiment of the present disclosure.
  • Fig. 2 schematically shows a flowchart of determining screen space coordinates in an exemplary embodiment of the present disclosure.
  • Fig. 3 schematically shows a schematic diagram of key points of a hand in an exemplary embodiment of the present disclosure.
  • Fig. 4 schematically shows a flowchart for determining the true distance in an exemplary embodiment of the present disclosure.
  • Fig. 5 schematically shows a flowchart of calculating three-dimensional coordinates in a virtual world in an exemplary embodiment of the present disclosure.
  • FIG. 6 schematically shows a flow chart of the interaction between the key points of the preset position and the virtual object in the exemplary embodiment of the present disclosure.
  • FIG. 7 schematically shows a specific flow chart of the interaction between the key points of triggering the preset part and the virtual object in the exemplary embodiment of the present disclosure.
  • Fig. 8 schematically shows an overall flow chart of the interaction between the key points of the preset part and the virtual object in an exemplary embodiment of the present disclosure.
  • Fig. 9 schematically shows a block diagram of an interactive control device in an exemplary embodiment of the present disclosure.
  • FIG. 10 schematically shows a schematic diagram of an electronic device in an exemplary embodiment of the present disclosure.
  • FIG. 11 schematically shows a schematic diagram of a computer-readable storage medium in an exemplary embodiment of the present disclosure.
  • an interactive control method is provided, which can be applied to the field of augmented reality Any scene in the game, education, life and other application scenarios based on augmented reality.
  • FIG. 1 the interactive control method in this exemplary embodiment will be described in detail.
  • step S110 the screen space coordinates of the key points of the preset part are acquired, and the real distance of the key points of the preset part relative to the shooting device is acquired.
  • the preset part may be any part that can interact with virtual objects in the virtual world (virtual space), for example, including but not limited to the user's hand or head, etc.
  • the preset part is the user's hand as an example, and the hand here refers to one hand or two hands that the user interacts with the virtual object.
  • the screen space coordinates refer to the two-dimensional coordinates (X and Y coordinates) in the image space displayed on the screen.
  • the screen space coordinates are not affected by the position of the object in space, but only by the object itself and the viewport.
  • the screen space coordinates of the key points of the hand can be obtained by detecting the key points of the hand.
  • Hand keypoint detection is the process of identifying joints on fingers and fingertips in images containing hands.
  • the key point is an abstract description of a fixed area, which not only represents a point of information or location, but also represents the combined relationship between the context and the surrounding neighborhood.
  • Figure 2 shows a specific flow chart for obtaining screen space coordinates.
  • the step of obtaining the screen space coordinates of the key points of the preset position may include step S210 to step S230, wherein:
  • step S210 a first image collected by a monocular camera including the preset part is acquired.
  • the monocular camera reflects the three-dimensional world in a two-dimensional form.
  • the monocular camera here can be set on a mobile phone or on a shooting device such as a camera for capturing images.
  • the first image refers to a color image taken by a monocular camera.
  • the monocular camera can collect a color image including the hand from any angle and any distance. The angle and distance are not specifically limited here, as long as the hand can be clearly displayed.
  • step S220 key point detection is performed on the first image to obtain the screen space coordinates of the key point of the preset part.
  • step S210 based on the color image obtained in step S210, key point detection may be performed on the preset location.
  • the specific process of performing key point detection on the preset position to obtain the screen space coordinates may include step S230 and step S240, where:
  • step S230 the first image is processed by the trained convolutional neural network model to obtain the key points of the preset part.
  • the convolutional neural network model can be trained first to obtain a trained model.
  • a small amount of labeled data containing a certain key point of the hand can be used to train the convolutional neural network model.
  • multiple shooting devices with different perspectives can be used to shoot the hand, and the above-mentioned convolutional neural network model can be used to preliminarily detect the key points.
  • These key points can be triangulated according to the pose of the shooting device to obtain the three-dimensional positions of the key points.
  • the calculated three-dimensional position is reprojected to each two-dimensional image with different perspectives, and the convolutional neural network model is trained using these two-dimensional images and key point annotations.
  • the model is the trained convolutional neural network model. Further, the color image including the hand collected in step S210 may be input to the trained convolutional neural network model, and the key points of the hand can be accurately detected through the trained convolutional neural network model.
  • step S240 regression processing is performed on the key points of the preset part to obtain position information of the key points of the preset part and use the position information as the screen space coordinates.
  • regression processing refers to quantitatively describing the relationship between variables in the form of probability.
  • the model used for regression processing can be a linear regression model or a logistic regression model, etc., as long as the function can be realized.
  • the key points of the hand can be input into the regression model to obtain the position information of the key points of the hand, wherein the output corresponding to each key point of the hand is the X coordinate of the key point of the hand in the image space And Y coordinate.
  • the image coordinate system in the image space takes the center of the image plane as the coordinate origin, the X axis and the Y axis are respectively parallel to the two vertical sides of the image plane, and (X, Y) is used to represent the coordinate value.
  • Fig. 3 shows a schematic diagram of the key points of the hand.
  • 21 key points of the hand can be generated.
  • the real distance of the key point of the preset part relative to the photographing device can also be acquired.
  • the real distance refers to the real physical distance between the key point of the preset part and the shooting device, such as 1 meter, 2 meters, and so on.
  • FIG. 4 shows a schematic diagram of obtaining the true distance of the key points of the preset part relative to the shooting device. Referring to FIG. 4, it mainly includes steps S410 to S430, in which:
  • step S410 a second image collected by the depth camera and containing the preset part is acquired.
  • the photographing device refers to a depth camera used to photograph a second image including the hand, and the second image is a depth image photographed by the depth camera.
  • Depth cameras include, but are not limited to, TOF (Time of Flight) cameras, and may also be other cameras used to measure depth, such as infrared distance sensor cameras, structured light cameras, and laser structure cameras.
  • TOF camera is taken as an example for description.
  • the TOF camera can be composed of several units such as lens, light source, optical components, sensor, control circuit and processing circuit.
  • the TOF camera adopts the active light detection method, and its main purpose is to use the change of the incident light signal and the reflected light signal to measure the distance.
  • the principle of acquiring the second image of the hand by the TOF module includes: transmitting continuous near-infrared pulses to the target scene, and then using a sensor to receive the light pulses reflected by the hand. By comparing the phase difference between the emitted light pulse and the light pulse reflected by the hand, the transmission delay between the light pulses can be calculated to obtain the distance between the hand and the transmitter, and finally a depth image of the hand can be obtained.
  • Obtaining the second image of the hand by the depth camera can avoid the problems of increased cost and inconvenience caused by measuring depth information with other sensors outside the terminal.
  • the second image collected by the depth camera in step S410 and the first image collected by the monocular camera in step S210 are collected at the same time to ensure that the collected color image and the depth image have a one-to-one correspondence relationship.
  • step S420 an alignment operation is performed on the first image and the second image.
  • the alignment operation refers to an operation that makes the size of the color image and the depth image the same.
  • the alignment operation may be, for example, directly scaling the color image or the depth image, or performing post-processing on the depth image to enlarge its resolution.
  • other alignment methods may also be included, which are not specifically limited here.
  • step S430 the screen space coordinates are taken on the aligned second image to obtain the real distance from the key point of the preset position to the depth camera.
  • the screen space coordinates (X and Y coordinates) obtained in Figure 2 can be directly taken on the aligned depth image to obtain the key points of the hand.
  • the actual physical distance between depth cameras. By combining the screen space coordinates and the depth image, the real physical distance between the key points of the hand and the depth camera can be accurately obtained.
  • step S120 the real distance and the screen space coordinates are combined to determine the three-dimensional coordinates of the key points of the preset position in the virtual world.
  • the virtual world refers to a virtual world formed by reconstructing an environment for placing virtual objects and for interaction. Since the coordinates obtained in step S110 are the coordinates of the key points of the hand in the projection space, in order to obtain the coordinates of the key points of the hand in the virtual world, the coordinates of the key points in the projection space can be converted.
  • Fig. 5 schematically shows the specific process of calculating the three-dimensional coordinates in the virtual world. With reference to Fig. 5, it mainly includes steps S510 to S530, in which:
  • step S510 the three-dimensional coordinates of the projection space of the key points of the preset part are obtained according to the real distance and the screen space coordinates.
  • the screen space coordinates refer to the two-dimensional coordinates of the key points of the preset part in the projection space.
  • the real distance between the key points of the preset part and the depth camera can be used as the key point of the preset part in the projection space.
  • the Z-axis coordinate combines the real physical distance and the screen space coordinate to obtain the three-dimensional coordinate (X, Y, Z) of the key point of the preset position in the projection space.
  • the screen space coordinates of hand key point 1 obtained from color image 1 in the projection space are (1, 2)
  • the real physical distance of hand key point 1 obtained from depth image 2 to the depth camera is 0.5.
  • the three-dimensional coordinates of the hand key point 1 in the projection space are (1, 2, 0.5).
  • step S520 the projection matrix is determined according to the angle of view of the photographing device.
  • the field of view refers to the range that the lens can cover, that is, the angle formed by the two sides of the maximum range where the object under test (hand) can pass through the lens, and the larger the field of view angle is The bigger.
  • a parallel light source may be used to measure the field of view angle
  • a luminance meter may also be used to measure the brightness distribution of the shooting device to obtain the field of view angle
  • a spectrophotometer may also be used to measure the field of view angle.
  • the corresponding projection matrix can be determined according to the field of view angle to convert the three-dimensional coordinates of the projection space to the coordinate system of the virtual world.
  • the projection matrix is used to map the coordinates of each point to a two-dimensional screen, and the projection matrix will not change due to changes in the position of the model in the scene or the movement of the observer, and only needs to be initialized once.
  • Each shooting device can correspond to one or more projection matrices.
  • the projection matrix is a four-dimensional vector related to the distance of the near plane, the distance of the far plane, the FOV, and the display aspect ratio.
  • the projection matrix can be obtained directly from the application, or can be obtained by adaptive training using multiple key frames rendered after the application is started.
  • step S530 the three-dimensional coordinates of the projection space are converted into three-dimensional coordinates in the virtual world based on the projection matrix.
  • the three-dimensional coordinates of the key points of the preset location in the projection space can be transformed according to the projection matrix to obtain the three-dimensional coordinates of the key points of the preset location in the virtual world.
  • the coordinate system corresponding to the three-dimensional coordinates in the virtual world and the placed virtual object belong to the same coordinate system.
  • the process of estimating the key points of the preset part can be avoided, and the estimation of the three-dimensional coordinates can be avoided.
  • the steps and the resulting errors improve the accuracy and can obtain accurate three-dimensional coordinates; at the same time, improve the calculation efficiency and quickly obtain accurate three-dimensional coordinates.
  • step S130 the spatial relationship between the key points of the preset location and the virtual object in the virtual world is determined according to the three-dimensional coordinates, and the preset location is controlled based on the spatial relationship The key point of interacting with the virtual object.
  • the spatial relationship refers to whether the key points of the preset position are in contact with the virtual object or the positional relationship between the key points of the preset position and the virtual object. Specifically, the relationship between the two Expressed by distance. Further, the key points of the preset part can be controlled to interact with the virtual object according to the spatial relationship between the key points of the preset part and the virtual object, so as to realize a precise interaction process between the user and the virtual object in the augmented reality scene.
  • Fig. 6 schematically shows a flow chart of controlling the interaction between the key points of the preset position and the virtual object. Refer to Fig. 6, which specifically includes steps S610 to S630, in which:
  • step S610 the three-dimensional coordinates in the virtual world of the key points of the preset part interacting with the virtual object are acquired.
  • the key point of the preset position for interacting with the virtual object can be any of the key points shown in Figure 3, such as the fingertip of the index finger or the tail of the thumb, etc.
  • the fingertip of the index finger is taken as an example Be explained. If it is the fingertip of the index finger that is interacting with the virtual object, the key point of the index finger corresponding to the number 8 is determined according to the corresponding relationship between the key points of the preset position and the key point shown in FIG. 3. Further, the three-dimensional coordinates of the key point with the sequence number 8 in the virtual world can be obtained according to the process in step S110 and step S120.
  • step S620 the distance between the three-dimensional coordinates in the virtual world and the coordinates of the virtual object is calculated.
  • the coordinates of the virtual object refer to the coordinates of the center point of the virtual object in the virtual world, or the collision box of the virtual object.
  • the distance between the two can be calculated according to the distance calculation formula.
  • the distance here includes but is not limited to Euclidean distance, cosine distance and so on.
  • the distance calculation formula can be as shown in formula (1):
  • step S630 if the distance meets the preset distance, the interaction between the key points of the preset part and the virtual object is triggered.
  • the preset distance refers to a preset threshold for triggering the interaction.
  • the preset distance can be a small value, such as 5 cm or 10 cm.
  • the distance between the three-dimensional coordinates of the hand key points in the virtual world and the coordinates of the virtual object obtained in step S620 may be compared with a preset distance, so as to determine whether to trigger the interaction according to the comparison result . Specifically, if the distance is less than or equal to the preset distance, the key point of the preset location is triggered to interact with the virtual object; if the distance is greater than the preset distance, the key point of the preset location is not triggered to interact with the virtual object.
  • Fig. 7 schematically shows a flow chart of triggering the key points of the preset position to interact with the virtual object. Referring to Fig. 7, it specifically includes step S710 and step S720, wherein:
  • step S710 the current action of the key point of the preset part is identified.
  • this step firstly, it can be determined which kind of action the current action of the key point of the preset part belongs to, for example, which kind of action belongs to the click, press, and flip.
  • the actions of key points such as the preset location can be determined and recognized according to the characteristics of the key points of the preset location and the movement track of the key points of the preset location, etc., which will not be described in detail here.
  • step S720 the current action is matched with a plurality of preset actions, and the action interacts with the virtual object in response to the action according to the matching result; wherein, the multiple preset actions correspond to the interactive operations one-to-one .
  • multiple preset actions refer to standard actions or reference actions stored in the database in advance, including but not limited to clicking, pushing, flicking, pressing, flipping, etc.
  • Interactive operation refers to the interaction between the key points of each preset action and the virtual object. For example, clicking corresponds to a selection operation, pushing corresponds to close, toggling corresponds to scrolling left and right, pressing down corresponds to confirm, flip corresponds to returning, and so on. It should be noted that the one-to-one correspondence between preset actions and interactive operations can be adjusted according to actual needs, and there is no special limitation here.
  • the current movement of the identified key point of the hand can be matched with multiple preset movements stored in the database. Specifically, the similarity between the two can be calculated, and when the similarity is greater than a preset threshold, the one with the highest similarity is determined as the preset action for successful matching to improve the accuracy. Furthermore, the interaction can be performed in response to the current action according to the matching result. Specifically, the interaction operation corresponding to the preset action that is successfully matched may be determined as the interaction operation corresponding to the current action in step S710, so as to realize the process of interacting with the virtual object according to the current action. For example, if the determined current action is an operation of clicking a virtual object with an index finger, a corresponding selection operation can be performed.
  • Fig. 8 shows the overall flow chart of the interaction between the user and the virtual object in augmented reality. Referring to Fig. 8, it mainly includes the following steps:
  • step S801 a color image collected by a monocular camera is acquired.
  • step S802 the key points of the hand are detected to obtain the screen space coordinates.
  • step S803 the depth image collected by the depth camera is acquired, and the real distance can be obtained from the depth image.
  • step S804 the screen space coordinates and depth information are combined, where the depth information refers to the real distance of the key point of the hand from the depth camera.
  • step S805 the three-dimensional coordinates of the key points of the hand in the virtual world are obtained.
  • step S806 the spatial relationship between the key points of the hand and the virtual object is calculated to interact according to the spatial relationship.
  • the method provided in Figure 8 combines the screen space coordinates of the key points of the preset part and the real distance to the shooting device to obtain the three-dimensional coordinates of the key points of the preset part in the virtual world, avoiding the step of estimating the three-dimensional coordinates As well as the resulting error, the accuracy is improved, and accurate three-dimensional coordinates can be obtained, and then accurate interaction is realized based on the three-dimensional coordinates. Since the three-dimensional coordinates of the key points of the preset position can be obtained by combining the screen space coordinates and the real distance, the process of estimating the coordinates is avoided, the calculation efficiency is improved, and accurate three-dimensional coordinates can be obtained quickly.
  • the spatial relationship between the key points of the preset part and the virtual objects in the virtual world determined according to the three-dimensional coordinates can accurately control the key points of the preset parts to interact with the virtual objects, improving user experience.
  • an interactive control device is also provided. As shown in FIG. 9, the device 900 may include:
  • the parameter obtaining module 901 is configured to obtain the screen space coordinates of the key points of the preset part, and obtain the real distance of the key points of the preset part relative to the shooting device;
  • a three-dimensional coordinate calculation module 902 configured to combine the real distance with the screen space coordinates to determine the three-dimensional coordinates of the key points of the preset position in the virtual world;
  • the interaction execution module 903 is configured to determine the spatial relationship between the key points of the preset location and the virtual objects in the virtual world according to the three-dimensional coordinates, and control the key points of the preset location and the virtual objects based on the spatial relationship.
  • the virtual objects interact.
  • the parameter acquisition module includes: a first image acquisition module for acquiring a first image collected by a monocular camera and containing the preset part; a screen space coordinate determination module for Perform key point detection on the first image to obtain the screen space coordinates of the key point of the preset position.
  • the screen space coordinate determination module includes: a key point detection module for processing the first image through the trained convolutional neural network model to obtain the preset position The key points; the coordinate determination module is used to perform regression processing on the key points of the preset position, obtain the position information of the key points of the preset position and use the position information as the screen space coordinates.
  • the photographing device includes a depth camera
  • the parameter acquisition module includes: a second acquisition module configured to acquire a second image collected by the depth camera and containing the preset part;
  • the image alignment module is used to align the first image and the second image;
  • the real distance acquisition module is used to take the screen space coordinates on the aligned second image to obtain the The real distance from the key point of the preset position to the depth camera.
  • the three-dimensional coordinate calculation module includes: a reference coordinate acquisition module for obtaining the three-dimensional projection space of the key point of the preset position according to the real distance and the screen space coordinates Coordinates; a matrix calculation module for determining a projection matrix according to the angle of view of the shooting device; a coordinate conversion module for converting the three-dimensional coordinates of the projection space into three-dimensional coordinates in the virtual world based on the projection matrix .
  • the interaction execution module includes: a three-dimensional coordinate acquisition module configured to acquire three-dimensional coordinates in the virtual world of key points of a preset part that interacts with the virtual object;
  • the calculation module is used to calculate the distance between the three-dimensional coordinates and the coordinates of the virtual object;
  • the interactive judgment module is used to trigger the key points of the preset part to be The virtual objects interact.
  • the interaction judgment module includes: an action recognition module, used to identify the current action of the key points of the preset part; an interaction trigger module, used to associate the current action with multiple The preset action is matched, and the current action interacts with the virtual object in response to the current action according to the matching result; wherein, the multiple preset actions correspond to the interactive operations one-to-one.
  • modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory.
  • the features and functions of two or more modules or units described above may be embodied in one module or unit.
  • the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.
  • an electronic device capable of implementing the above method is also provided.
  • the electronic device 1000 according to this embodiment of the present invention will be described below with reference to FIG. 10.
  • the electronic device 1000 shown in FIG. 10 is only an example, and should not bring any limitation to the function and application scope of the embodiment of the present invention.
  • the electronic device 1000 is represented in the form of a general-purpose computing device.
  • the components of the electronic device 1000 may include but are not limited to: the aforementioned at least one processing unit 1010, the aforementioned at least one storage unit 1020, and a bus 1030 connecting different system components (including the storage unit 1020 and the processing unit 1010).
  • the storage unit stores program code, and the program code can be executed by the processing unit 1010, so that the processing unit 1010 executes the various exemplary methods described in the "Exemplary Methods" section of this specification.
  • Implementation steps For example, the processing unit 1010 may perform the steps shown in FIG. 1: In step S110, the screen space coordinates of the key points of the preset part are obtained, and the key points of the preset part relative to the shooting device are obtained.
  • step S120 the real distance is combined with the screen space coordinates to determine the three-dimensional coordinates of the key points of the preset part in the virtual world; in step S130, the three-dimensional coordinates are determined
  • the spatial relationship between the key points of the preset location and the virtual object in the virtual world, and the key points of the preset location are controlled to interact with the virtual object based on the spatial relationship.
  • the storage unit 1020 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 10201 and/or a cache storage unit 10202, and may further include a read-only storage unit (ROM) 10203.
  • RAM random access storage unit
  • ROM read-only storage unit
  • the storage unit 1020 may also include a program/utility tool 10204 having a set (at least one) program module 10205.
  • program module 10205 includes but is not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples or some combination may include the implementation of a network environment.
  • the bus 1030 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus.
  • the display unit 1040 may be a display having a display function to display the processing result obtained by the processing unit 1010 by executing the method in this exemplary embodiment through the display.
  • the display includes, but is not limited to, a liquid crystal display or other displays.
  • the electronic device 1000 can also communicate with one or more external devices 1200 (such as keyboards, pointing devices, Bluetooth devices, etc.), and can also communicate with one or more devices that enable a user to interact with the electronic device 1000, and/or communicate with Any device (such as a router, modem, etc.) that enables the electronic device 1000 to communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 1050.
  • the electronic device 1000 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 1060.
  • networks for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet
  • the network adapter 1060 communicates with other modules of the electronic device 1000 through the bus 1030. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the electronic device 1000, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
  • the exemplary embodiments described herein can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present disclosure.
  • a computing device which may be a personal computer, a server, a terminal device, or a network device, etc.
  • a computer-readable storage medium on which is stored a program product capable of implementing the above method in this specification.
  • various aspects of the present invention may also be implemented in the form of a program product, which includes program code, and when the program product runs on a terminal device, the program code is used to make the The terminal device executes the steps according to various exemplary embodiments of the present invention described in the above "Exemplary Method" section of this specification.
  • a program product 1100 for implementing the above method according to an embodiment of the present invention is described. It can adopt a portable compact disk read-only memory (CD-ROM) and include program code, and can be in a terminal device, For example, running on a personal computer.
  • CD-ROM compact disk read-only memory
  • the program product of the present invention is not limited thereto.
  • the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
  • the program product can use any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.
  • the program code used to perform the operations of the present invention can be written in any combination of one or more programming languages.
  • the programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural styles. Programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on.
  • the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computing device (for example, using Internet service providers) Business to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service providers Internet service providers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种交互控制方法、装置、电子设备及计算机可读存储介质,涉及计算机技术领域,所述交互控制方法包括:获取预设部位的关键点的屏幕空间坐标,并获取所述预设部位的关键点相对于拍摄设备的真实距离(S110);将所述真实距离与所述屏幕空间坐标进行结合,确定所述预设部位的关键点的虚拟世界中的三维坐标(S120);根据所述三维坐标确定所述预设部位的关键点与所述虚拟世界中虚拟物体的空间关系,并基于所述空间关系控制所述预设部位的关键点与所述虚拟物体进行交互(S130)。该方法能够准确得到预设部位的关键点在虚拟世界的三维坐标,进而控制预设部位的关键点与虚拟物体精准地进行交互。

Description

交互控制方法、装置、电子设备及存储介质 技术领域
本公开涉及计算机技术领域,具体而言,涉及一种交互控制方法、交互控制装置、电子设备以及计算机可读存储介质。
背景技术
在增强现实中,用户与虚拟物体之间实现精准交互尤为重要。相关技术中,先对环境进行重建形成虚拟世界,然后在虚拟世界中放置任意的虚拟物体。为了与放置的虚拟物体产生交互,需要通过利用相机采集到的彩色图像进行手部跟踪,获取手部的位置信息,再与虚拟物体进行例如拿起、放置、旋转等交互。
在上述方式中,手部跟踪获取到的手部的位置信息,只是在屏幕空间上的二维坐标。在与虚拟物体进行交互时,还需要将二维坐标估计成虚拟世界中的三维坐标,从而与虚拟物体的三维坐标进行空间上的计算。但是将二维坐标估计成三维坐标的步骤可能会存在比较大的误差,使得估计的三维坐标并不准确,从而导致不能精准交互。另外,估计三维坐标的过程可能导致操作效率较低,影响交互体验。
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。
发明内容
本公开的目的在于提供一种交互控制方法、装置、电子设备及计算机可读存储介质,进而至少在一定程度上克服由于相关技术的限制和缺陷而导致的无法实现精准交互的问题。
本公开的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本公开的实践而习得。
根据本公开的一个方面,提供一种交互控制方法,包括:获取预设部位的关键点的屏幕空间坐标,并获取所述预设部位的关键点相对于拍摄设备的真实距离;将所述真实距离与所述屏幕空间坐标进行结合,确定所述预设部位的关键点的虚拟世界中的三维坐标;根据所述三维坐标确定所述预设部位的关键点与所述虚拟世界中虚拟物体的空间关系,并基于所述空间关系控制所述预设部位的关键点与所述虚拟物体进行交互。
在本公开的一种示例性实施例中,获取预设部位的关键点的屏幕空间坐标包括:获取单目相机采集的包含所述预设部位的第一图像;对所述第一图像进行关键点检测,以得到所述预设部位的关键点的所述屏幕空间坐标。
在本公开的一种示例性实施例中,对所述第一图像进行关键点检测,以得到所述预设部位的关键点的所述屏幕空间坐标包括:通过训练好的卷积神经网络模型对所述第一图像 进行处理,得到所述预设部位的关键点;对所述预设部位的关键点进行回归处理,得到所述预设部位的关键点的位置信息并将所述位置信息作为所述屏幕空间坐标。
在本公开的一种示例性实施例中,所述拍摄设备包括深度相机,获取所述预设部位的关键点相对于拍摄设备的真实距离包括:获取所述深度相机采集的包含所述预设部位的第二图像;对所述第一图像与所述第二图像进行对齐操作;将所述屏幕空间坐标在对齐后的第二图像上进行取值,以得到所述预设部位的关键点到所述深度相机的所述真实距离。
在本公开的一种示例性实施例中,将所述真实距离与所述屏幕空间坐标进行结合,确定所述预设部位的关键点的虚拟世界中的三维坐标包括:根据所述真实距离与所述屏幕空间坐标得到所述预设部位的关键点的投影空间的三维坐标;根据所述拍摄设备的视场角确定投影矩阵;基于所述投影矩阵将所述投影空间的三维坐标转换为所述虚拟世界中的三维坐标。
在本公开的一种示例性实施例中,根据所述三维坐标确定所述预设部位的关键点与所述虚拟世界中虚拟物体的空间关系,并基于所述空间关系控制所述预设部位的关键点与所述虚拟物体进行交互包括:获取与所述虚拟物体进行交互的预设部位的关键点的所述虚拟世界中的三维坐标;计算所述三维坐标与所述虚拟物体的坐标之间的距离;若所述距离满足预设距离,则触发所述预设部位的关键点与所述虚拟物体进行交互。
在本公开的一种示例性实施例中,触发所述预设部位的关键点与所述虚拟物体进行交互包括:识别所述预设部位的关键点的当前动作;将所述当前动作与多个预设动作进行匹配,并根据匹配结果响应所述当前动作与所述虚拟物体进行交互;其中,所述多个预设动作与交互操作一一对应。
根据本公开的一个方面,提供一种交互控制装置,包括:参数获取模块,用于获取预设部位的关键点的屏幕空间坐标,并获取所述预设部位的关键点相对于拍摄设备的真实距离;三维坐标计算模块,用于将所述真实距离与所述屏幕空间坐标进行结合,确定所述预设部位的关键点的虚拟世界中的三维坐标;交互执行模块,用于根据所述三维坐标确定所述预设部位的关键点与所述虚拟世界中虚拟物体的空间关系,并基于所述空间关系控制所述预设部位的关键点与所述虚拟物体进行交互。
根据本公开的一个方面,提供一种电子设备,包括:处理器;以及
存储器,用于存储所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行上述任意一项所述的交互控制方法。
根据本公开的一个方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任意一项所述的交互控制方法。
本示例性实施例提供的交互控制方法、装置、电子设备及计算机可读存储介质中,一方面,结合预设部位的关键点的屏幕空间坐标和到拍摄设备的真实距离,得到预设部位的关键点在虚拟世界中的三维坐标,避免了对三维坐标进行估计的步骤,且减少了估计步骤 导致的误差,提高了准确性,能够得到准确的三维坐标,并基于该三维坐标实现精准交互。另一方面,由于能够结合屏幕空间坐标和真实距离得到预设部位的关键点的三维坐标,不需要对坐标进行估计,提高了计算效率,能够快速得到预设部位的关键点在虚拟世界中准确的三维坐标。再一方面,根据三维坐标确定的预设部位的关键点与虚拟世界中虚拟物体之间的空间关系,能够通过空间关系精准地控制预设部位的关键点与虚拟物体进行交互,提高用户体验。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示意性示出本公开示例性实施例中交互控制方法的示意图。
图2示意性示出本公开示例性实施例中确定屏幕空间坐标的流程图。
图3示意性示出本公开示例性实施例中手部关键点的示意图。
图4示意性示出本公开示例性实施例中确定真实距离的流程图。
图5示意性示出本公开示例性实施例中计算虚拟世界中三维坐标的流程图。
图6示意性示出本公开示例性实施例中控制预设部位的关键点与虚拟物体进行交互的流程图。
图7示意性示出本公开示例性实施例中触发预设部位的关键点与虚拟物体进行交互的具体流程图。
图8示意性示出本公开示例性实施例中预设部位的关键点与虚拟物体进行交互的整体流程图。
图9示意性示出本公开示例性实施例中交互控制装置的框图。
图10示意性示出本公开示例性实施例中的电子设备的示意图。
图11示意性示出本公开示例性实施例中的计算机可读存储介质的示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本公开的实施方式的充分理解。然而,本领域技术人员将意识到, 可以实践本公开的技术方案而省略所述特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知技术方案以避免喧宾夺主而使得本公开的各方面变得模糊。
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
为了解决相关技术中,估计手部在虚拟世界中的三维坐标而导致的影响交互过程的问题,本示例性实施例中,提供了一种交互控制方法,该交互控制方法可以应用于增强现实领域中的任何场景,例如基于增强现实的游戏、教育、生活等多个应用场景。接下来,参考图1所示,对本示例性实施例中的交互控制方法进行详细说明。
在步骤S110中,获取预设部位的关键点的屏幕空间坐标,并获取所述预设部位的关键点相对于拍摄设备的真实距离。
本示例性实施例中,预设部位可以为能够与虚拟世界(虚拟空间)中的虚拟物体进行交互的任意部位,例如包括但不限于用户的手部或者是头部等等,本示例性实施例中以预设部位为用户的手部为例进行说明,且此处的手部指的是用户与虚拟物体进行交互的一个手或者是两个手。
屏幕空间坐标指的是在屏幕上显示的位于图像空间中的二维坐标(X坐标和Y坐标),屏幕空间坐标不受物体在空间位置的影响,只受到物体本身和视口的影响。具体可以通过对手部进行关键点检测而得到手部的关键点的屏幕空间坐标。手部关键点检测是在手指上确定关节以及在包含手部的图像中确定指尖的过程。关键点是对一个固定区域的抽象描述,其不仅代表一个点信息或位置,还代表上下文与周围邻域的组合关系。
图2中示出了得到屏幕空间坐标的具体流程图。参考图2中所示,获取预设部位的关键点的屏幕空间坐标的步骤可包括步骤S210至步骤S230,其中:
在步骤S210中,获取单目相机采集的包含所述预设部位的第一图像。
本步骤中,单目相机是以二维的形式反映三维的世界,此处的单目相机可以设置在手机或者是设置在相机等用于采集图像的拍摄设备上。第一图像指的是单目相机拍摄的彩色图像。具体地,单目相机可从任意一个角度以及任意距离采集包含手部的彩色图像,此处对角度和距离不作特殊限定,只要能够清楚地展示手部即可。
在步骤S220中,对所述第一图像进行关键点检测,以得到所述预设部位的关键点的所述屏幕空间坐标。
本步骤中,可基于步骤S210中得到的彩色图像,对预设部位进行关键点检测。对预设部位进行关键点检测得到屏幕空间坐标的具体过程可以包括步骤S230和步骤S240, 其中:
在步骤S230中,通过训练好的卷积神经网络模型对所述第一图像进行处理,得到所述预设部位的关键点。
本步骤中,首先可对卷积神经网络模型进行训练,以得到训练好的模型。可使用少量标注的含有某个手部关键点的数据来训练卷积神经网络模型。具体地,可使用多个不同视角的拍摄设备来拍摄手部,用上述卷积神经网络模型初步检测关键点,将这些关键点根据拍摄设备的位姿构建三角得到关键点的三维位置,再将计算得到的三维位置重投影到每一幅不同视角的二维图像,使用这些二维图像和关键点标注训练卷积神经网络模型,经过多次迭代,即可得到较为精确的手部关键点检测模型,即训练好的卷积神经网络模型。进一步地,可将步骤S210中采集的包含手部的彩色图像输入至训练好的卷积神经网络模型,通过训练好的卷积神经网络模型来准确检测手部关键点。
在步骤S240中,对所述预设部位的关键点进行回归处理,得到所述预设部位的关键点的位置信息并将所述位置信息作为所述屏幕空间坐标。
本步骤中,在检测到手部关键点之后,可对这些手部关键点进行回归处理。回归处理指的是以概率的形式定量描述变量之间的关系。进行回归处理时的模型可以为线性回归模型或者是逻辑回归模型等等,只要能够实现该功能即可。具体地,可将手部关键点输入回归模型中,以得到该手部关键点的位置信息,其中,每个手部关键点对应的输出即是该手部关键点在图像空间中的X坐标和Y坐标。图像空间中的图像坐标系以图像平面的中心为坐标原点,X轴和Y轴分别平行于图像平面的两条垂直边,用(X,Y)表示其坐标值。
图3中示出了手部关键点的示意图,参考图3中所示,针对包含手部的彩色图像,可生成手部的21个关键点(序号0至序号20的关键点)。
除此之外,本示例性实施例中还可以获取预设部位的关键点相对于拍摄设备的真实距离。真实距离指的就是预设部位的关键点距离拍摄设备的真实物理距离,例如1米、2米等等。
图4中示出了获取预设部位的关键点相对于拍摄设备的真实距离的示意图。参考图4中所示,主要包括步骤S410至步骤S430,其中:
在步骤S410中,获取所述深度相机采集的包含所述预设部位的第二图像。
本步骤中,拍摄设备指的是用于拍摄包含手部的第二图像的深度相机,第二图像为由深度相机拍摄的深度图像。深度相机包括但不限于TOF(Time of Flight,飞行时间)相机,也可以为其他用于测量深度的相机,例如红外距离传感器相机、结构光相机以及激光结构相机中的任意一种,本示例性实施例中以TOF相机为例进行说明。
TOF相机可由镜头、光源、光学部件、传感器、控制电路以及处理电路等几部单元组成。TOF相机采用的是主动光探测方式,其主要目的是利用入射光信号与反射光信号 的变化来进行距离测量。具体地,TOF模组获取手部的第二图像的原理包括:通过对目标场景发射连续的近红外脉冲,然后用传感器接收由手部反射回的光脉冲。通过比较发射光脉冲与经过手部反射的光脉冲的相位差,可以推算得到光脉冲之间的传输延迟进而得到手部相对于发射器的距离,最终得到一幅手部的深度图像。通过深度相机获取手部的第二图像,能够避免在终端外借助其他传感器测量深度信息而导致的增加成本以及操作不便的问题。
需要说明的是,步骤S410中深度相机采集的第二图像和步骤S210中单目相机采集的第一图像是同时采集的,以保证采集的彩色图像和深度图像之间满足一一对应的关系。
在步骤S420中,对所述第一图像与所述第二图像进行对齐操作。
本步骤中,由于第二图像与第一图像是同时采集的,两种图像之间存在一一对应关系,且分别是现实空间中同一个点在两个图像上的不同表示形式。由于彩色图像的分辨率大于深度图像的分辨率,且彩色图像和深度图像存在尺寸上的不同,因此需要对彩色图像和深度图像进行对齐操作,以提高图像结合的准确性。对齐操作指的是使得彩色图像和深度图像的尺寸相同的操作。对齐操作例如可以为:直接对彩色图像或深度图像进行缩放,或者是对深度图像进行后处理以扩大其分辨率,当然也可以包括其他对齐方式,此处不作特殊限定。
在步骤S430中,将所述屏幕空间坐标在对齐后的第二图像上进行取值,以得到所述预设部位的关键点到所述深度相机的所述真实距离。
本步骤中,在将彩色图像和深度图像对齐后,可将图2中得到的屏幕空间坐标(X坐标和Y坐标)直接在对齐后的深度图像上进行取值,以得到手部关键点到深度相机之间的实际物理距离。通过将屏幕空间坐标和深度图像进行结合的方式,能够准确得到手部关键点离深度相机的真实物理距离。
继续参考图1所示,在步骤S120中,将所述真实距离与所述屏幕空间坐标进行结合,确定所述预设部位的关键点的虚拟世界中的三维坐标。
本示例性实施例中,虚拟世界指的是对环境进行重建而形成的用于放置虚拟物体以及用于交互的虚拟的世界。由于步骤S110中得到的坐标是手部关键点在投影空间中的坐标,为了得到手部关键点在虚拟世界中的坐标,可对其在投影空间中的坐标进行转换。
图5中示意性示出了计算虚拟世界中的三维坐标的具体过程,参考图5所示,主要包括步骤S510至步骤S530,其中:
在步骤S510中,根据所述真实距离与所述屏幕空间坐标得到所述预设部位的关键点的投影空间的三维坐标。
本步骤中,屏幕空间坐标指的是预设部位的关键点在投影空间的二维坐标,同时可将预设部位的关键点到深度相机的真实距离作为预设部位的关键点在投影空间的Z轴坐 标,以将真实物理距离和屏幕空间坐标进行结合,得到预设部位的关键点在投影空间中的三维坐标(X,Y,Z)。举例而言,由彩色图像1得到的手部关键点1在投影空间的屏幕空间坐标为(1,2),由深度图像2得到的手部关键点1离深度相机的真实物理距离为0.5,则可以认为手部关键点1在投影空间中的三维坐标为(1,2,0.5)。
在步骤S520中,根据所述拍摄设备的视场角确定投影矩阵。
本步骤中,视场角指的是镜头所能覆盖的范围,即以被测目标(手部)的物象可通过镜头的最大范围的两条边构成的夹角,且视场角越大视野就越大。具体地,可采用平行光源进行视场角的测量,也可以采用亮度计测量拍摄设备的亮度分布从而得到视场角,还可以采用分光光度计进行视场角的测量。
在得到视场角之后,可根据视场角确定对应的投影矩阵,以将投影空间的三维坐标转换到虚拟世界的坐标系下。投影矩阵用于将每个点的坐标映射到二维的屏幕上,且投影矩阵不会因为场景里模型的位置变化或观察者的移动而变化,只需要一次初始化即可。每个拍摄设备可对应一个或多个投影矩阵,投影矩阵是关于近平面距离、远平面距离、视场角FOV、显示宽高比相关的一个四维向量。投影矩阵可直接从应用取得,也可利用应用启动后渲染得到的多个关键帧自适应训练获得。
继续参考图5所示,在步骤S530中,基于所述投影矩阵将所述投影空间的三维坐标转换为所述虚拟世界中的三维坐标。
本步骤中,在得到投影矩阵之后,可根据投影矩阵对预设部位的关键点在投影空间中的三维坐标进行转化,以得到预设部位的关键点在虚拟世界中的三维坐标。需要说明的是,虚拟世界中的三维坐标对应的坐标系与放置的虚拟物体属于同一个坐标系。
本示例性实施例中,通过屏幕空间坐标与预设部位的关键点相对于拍摄设备的真实距离进行结合,能够避免对预设部位的关键点进行估计的过程,避免了对三维坐标进行估计的步骤以及导致的误差,提高了准确性,能够得到准确的三维坐标;与此同时,提高了计算效率,能够快速得到准确的三维坐标。
继续参考图1所示,在步骤S130中,根据所述三维坐标确定所述预设部位的关键点与所述虚拟世界中虚拟物体的空间关系,并基于所述空间关系控制所述预设部位的关键点与所述虚拟物体进行交互。
本示例性实施例中,空间关系指的是预设部位的关键点与虚拟物体之间是否接触或者是预设部位的关键点与虚拟物体之间的位置关系,具体可以用二者之间的距离来表示。进一步地,可根据预设部位的关键点与虚拟物体之间的空间关系来控制预设部位的关键点与虚拟物体进行交互,以实现用户和增强现实场景中虚拟物体的精准交互过程。
图6中示意性示出了控制预设部位的关键点和虚拟物体进行交互的流程图,参考图6中所示,具体包括步骤S610至步骤S630,其中:
在步骤S610中,获取与所述虚拟物体进行交互的预设部位的关键点的虚拟世界中的 三维坐标。
本步骤中,与虚拟物体进行交互的预设部位的关键点可以为图3中示出的任意一个关键点,例如食指的指尖或者是拇指的尾部等等,此处以食指的指尖为例进行说明。若与虚拟物体进行交互的是食指的指尖,则根据预设部位的关键点与图3中所示的对应关系,确定食指的指尖对应序号为8的关键点。进一步地,可根据步骤S110和步骤S120中的过程获取序号为8的关键点在虚拟世界中的三维坐标。
在步骤S620中,计算所述虚拟世界中的三维坐标与所述虚拟物体的坐标之间的距离。
本步骤中,虚拟物体的坐标指的是虚拟世界中虚拟物体的中心点的坐标,或者是虚拟物体的碰撞盒。在得到预设部位的关键点在虚拟世界中的三维坐标,以及虚拟物体的中心点的坐标后,可根据距离计算公式来计算二者之间的距离。此处的距离包括但不限于欧式距离、余弦距离等等。距离计算公式可以为公式(1)所示:
Figure PCTCN2020089448-appb-000001
在步骤S630中,若所述距离满足预设距离时,触发所述预设部位的关键点与所述虚拟物体之间进行交互。
本步骤中,预设距离指的是事先设置的用于触发交互的一个阈值,为了有效触发交互,预设距离可以为一个较小的数值,比如5厘米或者是10厘米等等。本示例性实施例中,可将步骤S620中得到的手部关键点的虚拟世界中的三维坐标与虚拟物体的坐标之间的距离与预设距离进行比较,从而根据比较结果来确定是否触发交互。具体地,若距离小于或者等于预设距离,则触发预设部位的关键点与虚拟物体进行交互;若距离大于预设距离,则不会触发预设部位的关键点与虚拟物体进行交互。举例而言,若进行的是食指点击虚拟物体的操作,首先根据关键点的序号,取序号为8的关键点在虚拟世界中的三维坐标(X,Y,Z);接下来计算序号为8的关键点的坐标与虚拟物体的中心点的欧氏距离;进一步地当欧式距离小于预设距离(5厘米)时触发该点击操作。
图7示意性示出触发预设部位的关键点与虚拟物体进行交互的流程图,参考图7中所示,具体包括步骤S710和步骤S720,其中:
在步骤S710中,识别所述预设部位的关键点的当前动作。
本步骤中,首先可确定预设部位的关键点的当前动作属于哪种动作,例如属于点击、下压、翻转等动作中的哪一种。具体可根据预设部位的关键点的特征以及预设部位的关键点的运动轨迹等等来确定和识别预设部位等的关键点的动作,此处不做详细描述。
在步骤S720中,将所述当前动作与多个预设动作进行匹配,并根据匹配结果响应所述动作与所述虚拟物体进行交互;其中,所述多个预设动作与交互操作一一对应。
本步骤中,多个预设动作指的是事先存储在数据库中的标准动作或者是参考动作,包括但不限于点击、推、拨动、下压、翻转等等。交互操作指的是每个预设动作对应的 预设部位的关键点与虚拟物体之间的交互。例如,点击对应选择操作、推对应关闭、拨动对应左右滚动、下压对应于确认、翻转对应于返回等等。需要说明的是,预设动作与交互操作之间的一一对应关系可根据实际需求进行调整,此处不做特殊限定。
进一步地,可将识别到的手部关键点的当前动作与数据库中存储的多个预设动作进行匹配。具体地,可计算二者之间的相似度,并在相似度大于预设的阈值时,将相似度最高的确定为匹配成功的预设动作,以提高准确性。再进一步地,可根据匹配结果来响应当前动作进行交互。具体地,可将匹配成功的预设动作对应的交互操作确定为步骤S710中的当前动作对应的交互操作,以根据当前动作实现与虚拟物体进行交互的过程。举例而言,若确定的当前动作为食指点击虚拟物体的操作,则可对应执行选择操作。
图8中示出了增强现实中用户与虚拟物体交互的整体流程图,参考图8中所示,主要包括以下步骤:
在步骤S801中,获取单目相机采集的彩色图像。
在步骤S802中,进行手部关键点检测,得到屏幕空间坐标。
在步骤S803中,获取深度相机采集的深度图像,具体可从深度图像中得到真实距离。
在步骤S804中,将屏幕空间坐标和深度信息进行结合,其中深度信息指的是手部关键点离深度相机的真实距离。
在步骤S805中,得到手部关键点在虚拟世界中的三维坐标。
在步骤S806中,计算手部关键点与虚拟物体的空间关系,以根据空间关系进行交互。
图8中提供的方法,结合预设部位的关键点的屏幕空间坐标和到拍摄设备的真实距离,得到预设部位的关键点在虚拟世界中的三维坐标,避免了对三维坐标进行估计的步骤以及导致的误差,提高了准确性,能够得到准确的三维坐标,进而基于该三维坐标实现精准交互。由于能够结合屏幕空间坐标和真实距离得到预设部位的关键点的三维坐标,避免了估计坐标的过程,提高了计算效率,能够快速得到准确的三维坐标。根据三维坐标确定的预设部位的关键点与虚拟世界中虚拟物体之间的空间关系,能够精准地控制预设部位的关键点与虚拟物体进行交互,提高用户体验。
本示例性实施例中,还提供了一种交互控制装置,参考图9所示,该装置900可以包括:
参数获取模块901,用于获取预设部位的关键点的屏幕空间坐标,并获取所述预设部位的关键点相对于拍摄设备的真实距离;
三维坐标计算模块902,用于将所述真实距离与所述屏幕空间坐标进行结合,确定所述预设部位的关键点的虚拟世界中的三维坐标;
交互执行模块903,用于根据所述三维坐标确定所述预设部位的关键点与所述虚拟世界中虚拟物体的空间关系,并基于所述空间关系控制所述预设部位的关键点与所述虚拟物体进行交互。
在本公开的一种示例性实施例中,参数获取模块包括:第一图像获取模块,用于获取单目相机采集的包含所述预设部位的第一图像;屏幕空间坐标确定模块,用于对所述第一图像进行关键点检测,以得到所述预设部位的关键点的所述屏幕空间坐标。
在本公开的一种示例性实施例中,屏幕空间坐标确定模块包括:关键点检测模块,用于通过训练好的卷积神经网络模型对所述第一图像进行处理,得到所述预设部位的关键点;坐标确定模块,用于对所述预设部位的关键点进行回归处理,得到所述预设部位的关键点的位置信息并将所述位置信息作为所述屏幕空间坐标。
在本公开的一种示例性实施例中,所述拍摄设备包括深度相机,参数获取模块包括:第二获取模块,用于获取所述深度相机采集的包含所述预设部位的第二图像;图像对齐模块,用于对所述第一图像与所述第二图像进行对齐操作;真实距离获取模块,用于将所述屏幕空间坐标在对齐后的第二图像上进行取值,以得到所述预设部位的关键点到所述深度相机的所述真实距离。
在本公开的一种示例性实施例中,三维坐标计算模块包括:参考坐标获取模块,用于根据所述真实距离与所述屏幕空间坐标得到所述预设部位的关键点的投影空间的三维坐标;矩阵计算模块,用于根据所述拍摄设备的视场角确定投影矩阵;坐标转换模块,用于基于所述投影矩阵将所述投影空间的三维坐标转换为所述虚拟世界中的三维坐标。
在本公开的一种示例性实施例中,交互执行模块包括:三维坐标获取模块,用于获取与所述虚拟物体进行交互的预设部位的关键点的所述虚拟世界中的三维坐标;距离计算模块,用于计算所述三维坐标与所述虚拟物体的坐标之间的距离;交互判断模块,用于若所述距离满足预设距离,则触发所述预设部位的关键点与所述虚拟物体进行交互。
在本公开的一种示例性实施例中,交互判断模块包括:动作识别模块,用于识别所述预设部位的关键点的当前动作;交互触发模块,用于将所述当前动作与多个预设动作进行匹配,并根据匹配结果响应所述当前动作与所述虚拟物体进行交互;其中,所述多个预设动作与交互操作一一对应。
需要说明的是,上述交互控制装置中各模块的具体细节已经在对应的方法中进行了详细阐述,因此此处不再赘述。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
此外,尽管在附图中以特定顺序描述了本公开中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。
在本公开的示例性实施例中,还提供了一种能够实现上述方法的电子设备。
所属技术领域的技术人员能够理解,本发明的各个方面可以实现为系统、方法或程序产品。因此,本发明的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。
下面参照图10来描述根据本发明的这种实施方式的电子设备1000。图10显示的电子设备1000仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。
如图10所示,电子设备1000以通用计算设备的形式表现。电子设备1000的组件可以包括但不限于:上述至少一个处理单元1010、上述至少一个存储单元1020、连接不同系统组件(包括存储单元1020和处理单元1010)的总线1030。
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元1010执行,使得所述处理单元1010执行本说明书上述“示例性方法”部分中描述的根据本发明各种示例性实施方式的步骤。例如,所述处理单元1010可以执行如图1中所示的步骤:在步骤S110中,获取预设部位的关键点的屏幕空间坐标,并获取所述预设部位的关键点相对于拍摄设备的真实距离;在步骤S120中,将所述真实距离与所述屏幕空间坐标进行结合,确定所述预设部位的关键点的虚拟世界中的三维坐标;在步骤S130中,根据所述三维坐标确定所述预设部位的关键点与所述虚拟世界中虚拟物体的空间关系,并基于所述空间关系控制所述预设部位的关键点与所述虚拟物体进行交互。
存储单元1020可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)10201和/或高速缓存存储单元10202,还可以进一步包括只读存储单元(ROM)10203。
存储单元1020还可以包括具有一组(至少一个)程序模块10205的程序/实用工具10204,这样的程序模块10205包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。
总线1030可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。
显示单元1040可以为具有显示功能的显示器,以通过该显示器展示由处理单元1010执行本示例性实施例中的方法而得到的处理结果。显示器包括但不限于液晶显示器或者是其它显示器。
电子设备1000也可以与一个或多个外部设备1200(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备1000交互的设备通信,和/或与使得该电子设备1000能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口1050进行。并且,电子 设备1000还可以通过网络适配器1060与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器1060通过总线1030与电子设备1000的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备1000使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开实施方式的方法。
在本公开的示例性实施例中,还提供了一种计算机可读存储介质,其上存储有能够实现本说明书上述方法的程序产品。在一些可能的实施方式中,本发明的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书上述“示例性方法”部分中描述的根据本发明各种示例性实施方式的步骤。
参考图11所示,描述了根据本发明的实施方式的用于实现上述方法的程序产品1100,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本发明的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本发明操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
此外,上述附图仅是根据本发明示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其他实施例。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由权利要求指出。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由所附的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限。

Claims (10)

  1. 一种交互控制方法,其特征在于,包括:
    获取预设部位的关键点的屏幕空间坐标,并获取所述预设部位的关键点相对于拍摄设备的真实距离;
    将所述真实距离与所述屏幕空间坐标进行结合,确定所述预设部位的关键点的虚拟世界中的三维坐标;
    根据所述三维坐标确定所述预设部位的关键点与所述虚拟世界中虚拟物体的空间关系,并基于所述空间关系控制所述预设部位的关键点与所述虚拟物体进行交互。
  2. 根据权利要求1所述的交互控制方法,其特征在于,获取预设部位的关键点的屏幕空间坐标包括:
    获取单目相机采集的包含所述预设部位的第一图像;
    对所述第一图像进行关键点检测,以得到所述预设部位的关键点的所述屏幕空间坐标。
  3. 根据权利要求2所述的交互控制方法,其特征在于,对所述第一图像进行关键点检测,以得到所述预设部位的关键点的所述屏幕空间坐标包括:
    通过训练好的卷积神经网络模型对所述第一图像进行处理,得到所述预设部位的关键点;
    对所述预设部位的关键点进行回归处理,得到所述预设部位的关键点的位置信息并将所述位置信息作为所述屏幕空间坐标。
  4. 根据权利要求2所述的交互控制方法,其特征在于,所述拍摄设备包括深度相机,获取所述预设部位的关键点相对于拍摄设备的真实距离包括:
    获取所述深度相机采集的包含所述预设部位的第二图像;
    对所述第一图像与所述第二图像进行对齐操作;
    将所述屏幕空间坐标在对齐后的第二图像上进行取值,以得到所述预设部位的关键点到所述深度相机的所述真实距离。
  5. 根据权利要求1所述的交互控制方法,其特征在于,将所述真实距离与所述屏幕空间坐标进行结合,确定所述预设部位的关键点的虚拟世界中的三维坐标包括:
    根据所述真实距离与所述屏幕空间坐标得到所述预设部位的关键点的投影空间的三维坐标;
    根据所述拍摄设备的视场角确定投影矩阵;
    基于所述投影矩阵将所述投影空间的三维坐标转换为所述虚拟世界中的三维坐标。
  6. 根据权利要求1所述的交互控制方法,其特征在于,根据所述三维坐标确定所述预设部位的关键点与所述虚拟世界中虚拟物体的空间关系,并基于所述空间关 系控制所述预设部位的关键点与所述虚拟物体进行交互包括:
    获取与所述虚拟物体进行交互的预设部位的关键点的所述虚拟世界中的三维坐标;
    计算所述三维坐标与所述虚拟物体的坐标之间的距离;
    若所述距离满足预设距离,则触发所述预设部位的关键点与所述虚拟物体进行交互。
  7. 根据权利要求6所述的交互控制方法,其特征在于,触发所述预设部位的关键点与所述虚拟物体进行交互包括:
    识别所述预设部位的关键点的当前动作;
    将所述当前动作与多个预设动作进行匹配,并根据匹配结果响应所述当前动作与所述虚拟物体进行交互;其中,所述多个预设动作与交互操作一一对应。
  8. 一种交互控制装置,其特征在于,包括:
    参数获取模块,用于获取预设部位的关键点的屏幕空间坐标,并获取所述预设部位的关键点相对于拍摄设备的真实距离;
    三维坐标计算模块,用于将所述真实距离与所述屏幕空间坐标进行结合,确定所述预设部位的关键点的虚拟世界中的三维坐标;
    交互执行模块,用于根据所述三维坐标确定所述预设部位的关键点与所述虚拟世界中虚拟物体的空间关系,并基于所述空间关系控制所述预设部位的关键点与所述虚拟物体进行交互。
  9. 一种电子设备,其特征在于,包括:
    处理器;以及
    存储器,用于存储所述处理器的可执行指令;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1-7任意一项所述的交互控制方法。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1-7任意一项所述的交互控制方法。
PCT/CN2020/089448 2019-05-14 2020-05-09 交互控制方法、装置、电子设备及存储介质 WO2020228643A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20805347.0A EP3971685A4 (en) 2019-05-14 2020-05-09 Interactive control method and apparatus, electronic device and storage medium
US17/523,265 US20220066545A1 (en) 2019-05-14 2021-11-10 Interactive control method and apparatus, electronic device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910399073.5A CN111949111B (zh) 2019-05-14 2019-05-14 交互控制方法、装置、电子设备及存储介质
CN201910399073.5 2019-05-14

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/523,265 Continuation US20220066545A1 (en) 2019-05-14 2021-11-10 Interactive control method and apparatus, electronic device and storage medium

Publications (1)

Publication Number Publication Date
WO2020228643A1 true WO2020228643A1 (zh) 2020-11-19

Family

ID=73289984

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/089448 WO2020228643A1 (zh) 2019-05-14 2020-05-09 交互控制方法、装置、电子设备及存储介质

Country Status (4)

Country Link
US (1) US20220066545A1 (zh)
EP (1) EP3971685A4 (zh)
CN (1) CN111949111B (zh)
WO (1) WO2020228643A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112562068A (zh) * 2020-12-24 2021-03-26 北京百度网讯科技有限公司 人体姿态生成方法、装置、电子设备及存储介质
CN113961107A (zh) * 2021-09-30 2022-01-21 西安交通大学 面向屏幕的增强现实交互方法、装置及存储介质
CN116453456A (zh) * 2023-06-14 2023-07-18 北京七维视觉传媒科技有限公司 Led屏幕校准方法、装置、电子设备及存储介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353930B (zh) * 2018-12-21 2022-05-24 北京市商汤科技开发有限公司 数据处理方法及装置、电子设备及存储介质
CN113570679A (zh) * 2021-07-23 2021-10-29 北京百度网讯科技有限公司 一种图形绘制方法、装置、设备以及存储介质
CN113849112B (zh) * 2021-09-30 2024-04-16 西安交通大学 适用于电网调控的增强现实交互方法、装置及存储介质
CN114690900B (zh) * 2022-03-16 2023-07-18 中数元宇数字科技(上海)有限公司 一种虚拟场景中的输入识别方法、设备及存储介质
CN115760964B (zh) * 2022-11-10 2024-03-15 亮风台(上海)信息科技有限公司 一种获取目标对象的屏幕位置信息的方法与设备
CN115830196B (zh) * 2022-12-09 2024-04-05 支付宝(杭州)信息技术有限公司 虚拟形象处理方法及装置
CN115937430B (zh) * 2022-12-21 2023-10-10 北京百度网讯科技有限公司 用于展示虚拟对象的方法、装置、设备及介质
CN116309850B (zh) * 2023-05-17 2023-08-08 中数元宇数字科技(上海)有限公司 一种虚拟触控识别方法、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105046710A (zh) * 2015-07-23 2015-11-11 北京林业大学 基于深度图分割与代理几何体的虚实碰撞交互方法及装置
CN107016704A (zh) * 2017-03-09 2017-08-04 杭州电子科技大学 一种基于增强现实的虚拟现实实现方法
CN108519817A (zh) * 2018-03-26 2018-09-11 广东欧珀移动通信有限公司 基于增强现实的交互方法、装置、存储介质及电子设备
US20180300098A1 (en) * 2017-04-17 2018-10-18 Intel Corporation Collaborative multi-user virtual reality

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3968477B2 (ja) * 1997-07-07 2007-08-29 ソニー株式会社 情報入力装置及び情報入力方法
US7227526B2 (en) * 2000-07-24 2007-06-05 Gesturetek, Inc. Video-based image control system
WO2008132724A1 (en) * 2007-04-26 2008-11-06 Mantisvision Ltd. A method and apparatus for three dimensional interaction with autosteroscopic displays
JP2009258884A (ja) * 2008-04-15 2009-11-05 Toyota Central R&D Labs Inc ユーザインタフェイス
JP5614014B2 (ja) * 2009-09-04 2014-10-29 ソニー株式会社 情報処理装置、表示制御方法及び表示制御プログラム
JPWO2012147363A1 (ja) * 2011-04-28 2014-07-28 パナソニック株式会社 画像生成装置
JP5509227B2 (ja) * 2012-01-31 2014-06-04 株式会社コナミデジタルエンタテインメント 移動制御装置、移動制御装置の制御方法、及びプログラム
US10262462B2 (en) * 2014-04-18 2019-04-16 Magic Leap, Inc. Systems and methods for augmented and virtual reality
KR101687017B1 (ko) * 2014-06-25 2016-12-16 한국과학기술원 머리 착용형 컬러 깊이 카메라를 활용한 손 위치 추정 장치 및 방법, 이를 이용한 맨 손 상호작용 시스템
US10304248B2 (en) * 2014-06-26 2019-05-28 Korea Advanced Institute Of Science And Technology Apparatus and method for providing augmented reality interaction service
KR101453815B1 (ko) * 2014-08-01 2014-10-22 스타십벤딩머신 주식회사 사용자의 시점을 고려하여 동작인식하는 인터페이스 제공방법 및 제공장치
CN105319991B (zh) * 2015-11-25 2018-08-28 哈尔滨工业大学 一种基于Kinect视觉信息的机器人环境识别与作业控制方法
CN106056092B (zh) * 2016-06-08 2019-08-20 华南理工大学 基于虹膜与瞳孔的用于头戴式设备的视线估计方法
CN110632734B (zh) * 2017-05-24 2021-09-14 Oppo广东移动通信有限公司 对焦方法及相关产品
CN107302658B (zh) * 2017-06-16 2019-08-02 Oppo广东移动通信有限公司 实现人脸清晰的对焦方法、装置和计算机设备
CN108022301B (zh) * 2017-11-23 2020-05-19 腾讯科技(上海)有限公司 一种图像处理方法、装置以及存储介质
CN111788589A (zh) * 2018-02-23 2020-10-16 Asml荷兰有限公司 训练用于计算光刻术的机器学习模型的方法
CN109176512A (zh) * 2018-08-31 2019-01-11 南昌与德通讯技术有限公司 一种体感控制机器人的方法、机器人及控制装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105046710A (zh) * 2015-07-23 2015-11-11 北京林业大学 基于深度图分割与代理几何体的虚实碰撞交互方法及装置
CN107016704A (zh) * 2017-03-09 2017-08-04 杭州电子科技大学 一种基于增强现实的虚拟现实实现方法
US20180300098A1 (en) * 2017-04-17 2018-10-18 Intel Corporation Collaborative multi-user virtual reality
CN108519817A (zh) * 2018-03-26 2018-09-11 广东欧珀移动通信有限公司 基于增强现实的交互方法、装置、存储介质及电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3971685A4 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112562068A (zh) * 2020-12-24 2021-03-26 北京百度网讯科技有限公司 人体姿态生成方法、装置、电子设备及存储介质
CN112562068B (zh) * 2020-12-24 2023-07-14 北京百度网讯科技有限公司 人体姿态生成方法、装置、电子设备及存储介质
CN113961107A (zh) * 2021-09-30 2022-01-21 西安交通大学 面向屏幕的增强现实交互方法、装置及存储介质
CN113961107B (zh) * 2021-09-30 2024-04-16 西安交通大学 面向屏幕的增强现实交互方法、装置及存储介质
CN116453456A (zh) * 2023-06-14 2023-07-18 北京七维视觉传媒科技有限公司 Led屏幕校准方法、装置、电子设备及存储介质
CN116453456B (zh) * 2023-06-14 2023-08-18 北京七维视觉传媒科技有限公司 Led屏幕校准方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
EP3971685A4 (en) 2022-06-29
CN111949111B (zh) 2022-04-26
CN111949111A (zh) 2020-11-17
EP3971685A1 (en) 2022-03-23
US20220066545A1 (en) 2022-03-03

Similar Documents

Publication Publication Date Title
WO2020228643A1 (zh) 交互控制方法、装置、电子设备及存储介质
CN110322500B (zh) 即时定位与地图构建的优化方法及装置、介质和电子设备
US11394950B2 (en) Augmented reality-based remote guidance method and apparatus, terminal, and storage medium
KR102590841B1 (ko) 가상 오브젝트 구동 방법, 장치, 전자기기 및 판독 가능 저장매체
WO2020259248A1 (zh) 基于深度信息的位姿确定方法、装置、介质与电子设备
JP6502370B2 (ja) 適応ホモグラフィ写像に基づく視線追跡
US9256986B2 (en) Automated guidance when taking a photograph, using virtual objects overlaid on an image
WO2022036980A1 (zh) 位姿确定方法、装置、电子设备、存储介质及程序
WO2020206666A1 (zh) 基于散斑图像的深度估计方法及装置、人脸识别系统
JP7268076B2 (ja) 車両再識別の方法、装置、機器及び記憶媒体
US11893702B2 (en) Virtual object processing method and apparatus, and storage medium and electronic device
JP7273129B2 (ja) 車線検出方法、装置、電子機器、記憶媒体及び車両
US20220282993A1 (en) Map fusion method, device and storage medium
CN106256124B (zh) 结构化立体
CN110349212B (zh) 即时定位与地图构建的优化方法及装置、介质和电子设备
WO2021136386A1 (zh) 数据处理方法、终端和服务器
CN111612852A (zh) 用于验证相机参数的方法和装置
CN113409368B (zh) 建图方法及装置、计算机可读存储介质、电子设备
Pandey et al. Efficient 6-dof tracking of handheld objects from an egocentric viewpoint
JP2023503750A (ja) ロボットの位置決め方法及び装置、機器、記憶媒体
CN110849380B (zh) 一种基于协同vslam的地图对齐方法及系统
CN115578432B (zh) 图像处理方法、装置、电子设备及存储介质
JP7293362B2 (ja) 撮影方法、装置、電子機器及び記憶媒体
US20220345621A1 (en) Scene lock mode for capturing camera images
Pandey et al. Egocentric 6-DoF tracking of small handheld objects

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20805347

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020805347

Country of ref document: EP

Effective date: 20211213