CN114332416B

CN114332416B - Image processing method, device, equipment and storage medium

Info

Publication number: CN114332416B
Application number: CN202111444802.8A
Authority: CN
Inventors: 宫延河
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2022-11-29
Anticipated expiration: 2041-11-30
Also published as: CN114332416A

Abstract

The present disclosure provides an image processing method, apparatus, device and storage medium, which relate to the field of computers, and in particular to the field of artificial intelligence such as augmented reality, computer vision, deep learning and the like. The image processing method comprises the following steps: determining an image to be selected in a plurality of images of the real world; acquiring fused pose information based on the pose information of the plurality of images; transforming preset virtual information based on the fusion pose information to obtain transformed virtual information; and displaying the transformed virtual information on the image to be selected. The present disclosure can improve image stability.

Description

Image processing method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of computers, and in particular, to the field of artificial intelligence, such as augmented reality, computer vision, and deep learning, and in particular, to an image processing method, apparatus, device, and storage medium.

Background

Augmented Reality (AR) technology is a technology that fuses virtual information with the real world, and implements "augmentation" of the real world. AR technology may be applied in many scenarios, such as virtual shoe fitting scenarios.

In the related art, under a virtual shoe fitting scene, the image stability is poor.

Disclosure of Invention

The present disclosure provides an image processing method, apparatus, device, and storage medium.

According to an aspect of the present disclosure, there is provided an image processing method including: determining an image to be selected in a plurality of images of the real world; acquiring fused pose information based on the pose information of the plurality of images; transforming preset virtual information based on the fusion pose information to obtain transformed virtual information; and displaying the transformed virtual information on the image to be selected.

According to another aspect of the present disclosure, there is provided an image processing apparatus including: the device comprises a determining module, a selecting module and a selecting module, wherein the determining module is used for determining an image to be selected in a plurality of images of the real world; the fusion module is used for obtaining fusion pose information based on the pose information of the plurality of images; the transformation module is used for transforming preset virtual information based on the fusion pose information to obtain transformed virtual information; and the display module is used for displaying the transformed virtual information on the image to be selected.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above aspects.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to any one of the above aspects.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of the above aspects.

According to the technical scheme of the disclosure, the image stability can be improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 3 is a schematic illustration according to a third embodiment of the present disclosure;

FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 5 is a schematic diagram according to a fifth embodiment of the present disclosure;

FIG. 6 is a schematic diagram according to a sixth embodiment of the present disclosure;

FIG. 7 is a schematic diagram according to a seventh embodiment of the present disclosure;

fig. 8 is a schematic diagram of an electronic device for implementing any one of the image processing methods of the embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure, which provides an image processing method including:

101. an image to be selected of a plurality of images of the real world is determined.

102. And acquiring fusion pose information based on the pose information of the plurality of images.

103. And transforming the preset virtual information based on the fusion pose information to obtain transformed virtual information.

104. And displaying the transformed virtual information on the image to be selected.

An Application (APP) can be installed on an electronic device used by a user, and the APP on the electronic device can be called a client. The APP may have an AR function, such as an AR shoe fitting function.

The electronic devices used by the user may include various terminal devices, smart home devices, wearable devices, and the like, where the terminal devices are, for example, mobile phones, tablet computers, and the like, the smart home devices are, for example, smart speakers and the like, and the wearable devices are, for example, smart watches, smart glasses, and the like.

As shown in fig. 2, taking an electronic device used by a user as a mobile phone 201 as an example, the mobile phone may have a camera, and an image of the real world may be acquired through the camera of the mobile phone.

In the technical scheme of the disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the common customs of public order.

The real-world image refers to an image of the real world to be fused with virtual information, and is, for example, an image of a user's foot, taking a virtual shoe fitting as an example.

The camera can have a video function, and the mobile phone can obtain a plurality of images of the foot images by starting the video function, and the plurality of images can be continuous multi-frame images in the video.

As shown in fig. 2, for example, when the mobile phone interacts with the cloud, the mobile phone 201 may send a plurality of images to the server 202 after acquiring the plurality of images through the camera, and the server may be a local server or a cloud.

After receiving the multiple images, the server can perform pose estimation on each image in the multiple images to obtain pose information of each image.

Pose (position), which refers to position and attitude, can be expressed in three-dimensional coordinates (x, y, z) and three angles (pitch, yaw, roll).

After the pose information of each image is obtained, the pose information can be fused to obtain fused pose information.

Fusion, which may also be referred to as information fusion or data fusion, refers to using computer technology to perform association, correlation, and integration on data and information obtained from a single or multiple information sources, so as to complete information processing procedures such as required decision making and estimation tasks.

The fusion pose information of this embodiment is obtained from a plurality of pose information.

The virtual information refers to virtual information to be integrated into an image of the real world, and the virtual information may be a virtual three-dimensional model of a shoe, taking a virtual shoe try as an example.

The virtual information may have a set initial pose at the initial time, and the pose of the virtual information may be adjusted based on the fusion pose information, that is, the virtual information may be transformed, in order to better fuse with the image of the real world.

The server side obtains the transformed virtual information, and can feed the transformed virtual information back to the client side to be displayed on the client side.

It can be understood that if the mobile phone has sufficient performance, the mobile phone may also perform the determination of the graph to be selected, the calculation and fusion of the pose information, the transformation processing of the virtual information, and the like locally.

After the virtual information is displayed on the image to be selected, the fusion of the virtual information and the real world is realized, and the AR function is realized.

In the embodiment of the disclosure, the pose information of a plurality of images is fused, the virtual information is transformed based on the fused pose information, and the transformed virtual information is displayed on the image of the real world, so that the image stability can be improved.

In some embodiments, the determining the images to be selected in the real-world plurality of images includes: and taking images with a second preset number which are separated from the current image in a first preset number of images before the current image as the images to be selected, wherein the second preset number is less than or equal to the first preset number.

The multi-frame image may be a continuous multi-frame image, and for example, a camera may be used to collect the continuous multi-frame image.

Suppose that the current picture is I _T And if the first preset number is represented by N, the continuous multi-frame images respectively comprise: i is _T-N 、I _T-N+1 、...、I _T-1 、I _T 。

Assuming that the second preset number is represented by M, the image to be selected is I _T-M 。

Wherein M and N are positive integers, and M is less than or equal to N.

By selecting the image before the current image as the image to be selected, the information before and after the image to be selected is fused by fusing the pose information for the image to be selected, so that the stability can be improved.

In some embodiments, the first predetermined number is an even number, and the second predetermined number is half of the first predetermined number.

For example, referring to fig. 3, N =4 may be selected, that is, the current image and 4 frames of images before the current image are obtained as continuous multi-frame images, and M = N/2=2 may be selected, that is, the 2 nd frame of image before the current image is selected as the image to be selected.

Because the second preset number is half of the first preset number, for the image to be selected, the image is located in the middle of the multi-frame image, consistent front and back information is fused, validity of the fused pose information is improved, and stability of the image is improved.

In some embodiments, the obtaining fused pose information based on pose information of the plurality of images includes: and carrying out weighted summation on the pose information of the plurality of images to obtain the fusion pose information.

The fusion pose information may be a weighted sum of the pose information of each image.

The calculation formula may be as follows:

wherein P is fusion pose information, P _i Pose information for the ith frame image, w _i The image weight value of the ith frame image is, i = 1.

Wherein a plurality of images may be stored in a queue of length L, L being selectable to be 5.

w _i May be predetermined.

P1-P5 can be determined by pose estimation.

Pose estimation (Pose estimation) is one of the technologies in the field of computer vision, and has great application in estimating the Pose of a robot by using a vision sensor for control, robot navigation, augmented reality and other aspects. The basis of this process of pose estimation is to find the corresponding points between the real world and the image projections. And then adopting a corresponding pose estimation method according to the types of the point pairs, such as 2D-2D,2D-3D and 3D-3D. Of course, the same type of point pairs can be divided into algebraic and nonlinear optimization-based methods, such as Direct Linear Transform (DLT) and Beam Adjustment (BA).

By weighting and summing the pose information of a plurality of images, the fusion pose information can be simply, conveniently and quickly obtained.

In some embodiments, the weighted summation is based on weight values corresponding to the plurality of images, the method further comprising: determining weight values corresponding to the plurality of images, wherein the weight values are in inverse proportion to the number of intervals, and the number of the intervals is the number of images at intervals between the images corresponding to the weight values and the images to be selected.

That is, the weight value of an image closer to the image to be selected is larger, and vice versa, with respect to the image to be selected.

The specific calculation formula may be:

where i is the image index, w _i Is the weight value of the ith frame image, i = 1., L is the number of the plurality of images, t is the index of the image to be selected, and can be generally selected as

The absolute value is calculated by the method,

is a round-down operation.

Based on the 5 frames of images shown in fig. 3, the relationship between the weight values of the 5 frames of images can be as shown in fig. 4, see fig. 4, and the size of the weight value is represented by a height, where the weight value of the image to be selected is the largest, and the weight values of the images before and after the image to be selected are sequentially reduced.

Through the fact that the weighted value is in inverse proportion to the interval number, more accurate fusion pose information can be obtained, and therefore the fusion effect of the virtual information and the image of the real world is improved.

In some embodiments, the real-world image is a foot image and the virtual information is a virtual three-dimensional model of the shoe.

That is, the method of the present embodiment may be applied to a virtual shoe fitting scenario.

The following description is directed to a virtual shoe fitting scenario.

Fig. 5 is a schematic diagram according to a fifth embodiment of the present disclosure, which provides an image processing method, and the embodiment takes a virtual shoe fitting scene as an example, and the method includes:

501. after starting the AR shoe test, continuously collecting multiple frames of foot images.

For example, the user can install the APP with the AR shoe fitting function on the electronic device used by the user, and start the AR shoe fitting function in the APP.

After the AR shoe fitting function is started, the user can align the camera on the electronic equipment to the foot of the user, and therefore the camera on the electronic equipment is used for collecting images of the foot.

When the foot images are collected, the foot images can be continuously collected, so that a plurality of frames of foot images are obtained.

502. And storing a plurality of frames of foot images.

503. And carrying out pose estimation on each frame of foot image in the multiple frames of foot images to obtain pose information of each frame of foot image.

504. And fusing the pose information of the foot images of each frame to obtain fused pose information.

505. And carrying out transformation processing on the virtual shoe three-dimensional model based on the fusion pose information so as to obtain a transformed shoe three-dimensional model.

The shoe three-dimensional model can be set to an initial pose, and the pose of the shoe three-dimensional model can be adjusted based on pose information of the foot images for better fusion with the foot images.

In the related art, as shown in the upper part of fig. 6, a shoe three-dimensional model (represented by a triangular shape) is adjusted based on pose information of a single frame image, whereas in the present embodiment, referring to the lower part of fig. 6, fusion pose information is obtained based on pose information of a plurality of frame images, and then, the shoe three-dimensional model is adjusted based on the fusion pose information.

When the shoe three-dimensional model is adjusted based on the pose information (whether the pose information of a single frame image or the fused pose information), the adjustment can be realized by adopting a correlation technique, for example, a transformation matrix (the matrix is generally a 4 x 4 matrix) can be obtained based on the pose information, and the transformation matrix is multiplied on the basis of the initial pose of the shoe three-dimensional model to obtain the transformed shoe three-dimensional model.

By adjusting the shoe three-dimensional model based on the fusion pose information, the accuracy of the shoe three-dimensional model can be improved due to the reference of more information.

506. And determining an image to be selected in the plurality of frames of foot images.

For example, the stored multi-frame images include 5 frames of images, and an image of an intermediate frame may be selected as the image to be selected.

507. And displaying the transformed shoe three-dimensional model on the image to be selected.

The shoe three-dimensional model can be rendered by a rendering module to be rendered on the image to be selected.

In the related art, as shown in the upper part of fig. 6, a three-dimensional model of a shoe is displayed in a current image (I) _T ) However, when the three-dimensional model of the shoe is displayed in this way, a problem of shaking of the three-dimensional model of the shoe, that is, poor stability of the image, may occur.

In the present embodiment, as shown in the upper part of fig. 6, the shoe three-dimensional model is displayed in a certain image (I) before the current image _T-2 ) In this way, the stability of the image can be improved.

It is understood that the contents that are not described in detail in this embodiment may be referred to the relevant description in other embodiments.

It should be understood that the timing relationship between the above steps is not limited, if there is no necessary timing relationship.

In the embodiment, the shoe three-dimensional model is adjusted based on the fusion pose information, the accuracy and the stability of the shoe three-dimensional model can be improved due to the fact that more comprehensive information is used as assistance, the shoe three-dimensional model is displayed on the image to be selected in front of the current image, the image to be selected can refer to the information of the front image and the rear image, and the stability can be further improved.

Fig. 7 is a schematic diagram of an eighth embodiment of the present disclosure, which provides an image processing apparatus 700, including: a determination module 701, a fusion module 702, a transformation module 703 and a display module 704.

The determining module 701 is configured to determine an image to be selected from a plurality of images of a real world; the fusion module 702 is configured to obtain fusion pose information based on the pose information of the plurality of images; the transformation module 703 is configured to transform preset virtual information based on the fusion pose information to obtain transformed virtual information; the display module 704 is configured to display the transformed virtual information on the image to be selected.

In some embodiments, the plurality of images are a plurality of frames of images in the real-world video stream, where the plurality of frames of images include a current image and a first preset number of images before the current image, and the determining module 701 is further configured to: and taking images with a second preset number which are separated from the current image in a first preset number of images before the current image as the images to be selected, wherein the second preset number is less than or equal to the first preset number.

In some embodiments, the fusion module 702 is further configured to: and carrying out weighted summation on the pose information of the plurality of images to obtain the fusion pose information.

In some embodiments, the weighted summation is based on weight values corresponding to the plurality of images, and the apparatus further comprises: an obtaining module, configured to determine weight values corresponding to the multiple images, where the weight values are inversely proportional to a number of intervals, and the number of intervals is a number of images in intervals between an image corresponding to the weight value and the image to be selected.

It is to be understood that in the disclosed embodiments, the same or similar contents in different embodiments may be mutually referred to.

It is to be understood that "first", "second", and the like in the embodiments of the present disclosure are used for distinction only, and do not indicate the degree of importance, the order of timing, and the like.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the electronic device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the electronic device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 801 executes the respective methods and processes described above, such as an image processing method. For example, in some embodiments, the image processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 800 via the ROM 802 and/or the communication unit 808. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the image processing method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An image processing method comprising:

determining an image to be selected in a plurality of images of the real world;

obtaining fused pose information based on the pose information of the plurality of images;

transforming preset virtual information based on the fusion pose information to obtain transformed virtual information; the virtual information is to be merged into the image of the real world;

displaying the transformed virtual information on the image to be selected;

the determining of the image to be selected in the plurality of images of the real world includes:

and taking images with a second preset number which are separated from the current image in a first preset number of images before the current image as the images to be selected, wherein the second preset number is less than or equal to the first preset number.

2. The method of claim 1, wherein the first predetermined number is an even number and the second predetermined number is half of the first predetermined number.

3. The method of claim 1, wherein the obtaining fused pose information based on pose information for the plurality of images comprises:

and performing weighted summation on the pose information of the plurality of images to obtain the fusion pose information.

4. The method of claim 3, wherein the weighted summation is based on corresponding weight values of the plurality of images, the method further comprising:

determining weight values corresponding to the plurality of images, wherein the weight values are in inverse proportion to the number of intervals, and the number of the intervals is the number of images at intervals between the images corresponding to the weight values and the images to be selected.

5. The method according to any one of claims 1-4, wherein the image of the real world is an image of a foot and the virtual information is a virtual three-dimensional model of a shoe.

6. An image processing apparatus comprising:

the device comprises a determining module, a selecting module and a selecting module, wherein the determining module is used for determining an image to be selected in a plurality of images of the real world;

a fusion module for obtaining fusion pose information based on the pose information of the plurality of images;

the transformation module is used for transforming preset virtual information based on the fusion pose information to obtain transformed virtual information; the virtual information is to be merged into the image of the real world;

the display module is used for displaying the transformed virtual information on the image to be selected; wherein the plurality of images are multi-frame images in the real-world video stream, the multi-frame images include a current image and a first preset number of images before the current image, and the determining module is further configured to:

and taking images with a second preset number, which are less than or equal to the first preset number, in the first preset number of images before the current image, and the current image as the images to be selected.

7. The apparatus of claim 6, wherein the first predetermined number is an even number and the second predetermined number is half of the first predetermined number.

8. The apparatus of claim 6, wherein the fusion module is further to:

and carrying out weighted summation on the pose information of the plurality of images to obtain the fusion pose information.

9. The apparatus of claim 8, wherein the weighted sum is based on weight values corresponding to the plurality of images, the apparatus further comprising:

an obtaining module, configured to determine weight values corresponding to the multiple images, where the weight values are inversely proportional to a number of intervals, and the number of intervals is a number of images in intervals between an image corresponding to the weight value and the image to be selected.

10. The apparatus according to any one of claims 6-9, wherein the real world image is an image of a foot and the virtual information is a virtual three-dimensional model of a shoe.

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.