CN113538321A

CN113538321A - Vision-based volume measurement method and terminal equipment

Info

Publication number: CN113538321A
Application number: CN202010248237.7A
Authority: CN
Inventors: 马嘶鸣; 戚向涛
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2021-10-22

Abstract

The application provides a volume measurement method based on vision and a terminal device, which are applicable to the technical field of object detection, and the method comprises the following steps: and acquiring a color image of the object to be detected, and performing object identification on the color image to obtain the object type of the object to be detected. The method comprises the steps of obtaining a depth image of an object to be detected, generating a first three-dimensional point cloud of the object to be detected according to the depth image, and generating a bounding box of the first three-dimensional point cloud. And cutting the bounding box according to the type of the object, calculating the volume of the cut bounding box, and taking the obtained volume as the volume measurement result of the object to be measured. In the embodiment of the application, the bounding box is cut according to the actual condition of the object, and the volume is calculated, so that the accuracy of volume measurement is improved.

Description

Vision-based volume measurement method and terminal equipment

Technical Field

The application belongs to the technical field of object detection, and particularly relates to a volume measurement method based on vision and a terminal device.

Background

Vision-based volume measurement refers to the process of identifying the volume of an object by acquiring image or video information of the object and from the image or video information. Compared with manual volume measurement, the volume measurement operation based on vision is simpler and more convenient, and the method can be suitable for more different scenes.

Conventional vision-based volumetric measurement schemes simultaneously acquire color and depth images of an object. After the object is identified according to the color image, the corresponding area of the object is positioned in the depth image according to the identification result. And drawing the bounding box of the object according to the depth image positioning result, and calculating the volume of the bounding box. And finally, taking the obtained volume of the bounding box as the volume of the object.

Although the traditional volume measurement scheme based on vision can realize the volume measurement of objects, the measurement precision is low, and the precision requirement of practical application scenes is difficult to meet.

Disclosure of Invention

In view of this, embodiments of the present application provide a volume measurement method based on vision and a terminal device, which can solve the problem of low measurement accuracy when performing object volume measurement based on vision.

A first aspect of an embodiment of the present application provides a method for vision-based volume measurement, including:

and acquiring a color image of the object to be detected, and performing object identification on the color image to obtain the object type of the object to be detected. The method comprises the steps of obtaining a depth image of an object to be detected, generating a first three-dimensional point cloud of the object to be detected according to the depth image, and generating a bounding box of the first three-dimensional point cloud. And cutting the bounding box according to the type of the object, calculating the volume of the cut bounding box, and taking the obtained volume as the volume measurement result of the object to be measured.

In the embodiment of the application, since the three-dimensional point cloud of the object is extracted according to the depth image data, the positioning according to the color image is not needed. Therefore, the method is not influenced by the color image object recognition. And the drawing accuracy of the bounding box is improved. Meanwhile, the bounding box is further cut according to the actual condition of the object to be measured, so that the cut bounding box is more suitable for the actual shape of the object to be measured. Therefore, the volume calculation is carried out according to the cut bounding box, and the accuracy of the object volume measurement can be greatly improved.

In a first possible implementation manner of the first aspect, the operation of cutting the bounding box includes:

and acquiring the three-dimensional space shape of the object to be detected according to the type of the object, and cutting the bounding box according to the three-dimensional space shape.

In the embodiment of the present application, the corresponding three-dimensional space shape is set in advance according to the actual shape of each object. And after the drawn bounding box is obtained, searching the three-dimensional space shape actually corresponding to the object to be detected, and cutting the bounding box according to the three-dimensional space shape. Until the shape of the bounding box is cut into the three-dimensional shape. And further, the finally obtained bounding box is combined with the object more closely, and the extraction of the object envelope information is more precise.

In a second possible implementation manner of the first aspect, the operations of bounding box cutting and volume calculation include:

and acquiring first size information of the bounding box, and processing the first size information and the object type by using a volume estimation model trained in advance to obtain a volume measurement result. The volume estimation model is used for cutting the bounding box and calculating a volume measurement result according to the cut bounding box.

Considering that the shape of an object is often complex in practical situations, if technicians manually enumerate corresponding three-dimensional space shapes of different object types, the enumeration result is very limited. And for some objects with irregular spatial shapes, the enumeration method may have limitations. Therefore, the bounding box cutting is performed based on the matched three-dimensional space shape, and the obtained result is accurate and may be reduced sometimes. In the embodiment of the present application, a volume estimation model for adaptively learning and calculating the volume of an object according to the size of a bounding box and the kind of the object is constructed and trained in advance. Compared with manual enumeration, the neural network model can learn more types in a self-adaptive mode, and has stronger adaptability to irregular objects. The final volumetric measurement accuracy can be improved.

On the basis of the first or second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the result of the object identification further includes a first pixel point of the object to be detected in the color image.

Correspondingly, the operation of generating the first three-dimensional point cloud of the object to be detected according to the depth image comprises the following steps:

and screening out second pixel points within a preset range of the object to be detected from the color image to obtain a pixel point set consisting of the first pixel points and the second pixel points. And generating a second three-dimensional point cloud corresponding to the depth image, and screening a third three-dimensional point cloud corresponding to the pixel point set from the second three-dimensional point cloud. And screening the first three-dimensional point cloud corresponding to the object to be detected from the third three-dimensional point cloud.

According to the embodiment of the application, after the first pixel point is obtained, a part of pixel points are expanded to the periphery of the object to be detected on the basis of the first pixel point. And then realize the extension to the object edge, prevent that edge pixel from losing. And finding out the third three-dimensional point cloud of the object area to be detected according to the expanded range. The third three-dimensional point cloud comprises points of the newly added environment after the edge expansion of the object to be detected. Therefore, the third three-dimensional point cloud needs to be screened for the first three-dimensional point cloud corresponding to the object to be detected, so that the object to be detected is accurately positioned.

On the basis of the first possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, the result of the object identification further includes an object contour shape of the object to be detected.

Correspondingly, the method for acquiring the three-dimensional space shape of the object to be detected according to the object type comprises the following steps:

and acquiring the three-dimensional space shape of the object to be detected according to the object type and the object outline graph.

Because the object to be measured is under different putting angles, in order to improve the accuracy of cutting, the cutting mode demand to the bounding box is different this moment. The contour shape of an object may be different for different shooting angles and/or pose of the object. The angle of the object relative to the shooting device when the object is actually shot can be theoretically recognized by the shape of the object in the color image. Therefore, when the three-dimensional space shape is set for each type of object, a plurality of three-dimensional space shapes can be set for the same type of object. The three-dimensional space shapes are substantially the same in shape, but different in angle of arrangement. Meanwhile, the corresponding relation between the three-dimensional space shape and the object outline shape is set. On the basis, the three-dimensional space shape actually corresponding to the object to be measured is screened out according to the object type and the object outline shape, and therefore the cutting accuracy can be improved.

On the basis of the first possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, the result of the object identification further includes an object contour shape of the object to be detected.

Correspondingly, the bounding box is cut according to the three-dimensional space shape, and the method comprises the following steps:

and obtaining a cutting strategy corresponding to the contour shape of the object, and cutting the bounding box according to the cutting strategy until the shape of the cut bounding box is a three-dimensional space shape, thereby finishing cutting.

Based on the same principle as the fourth aspect, the embodiment of the present application sets a three-dimensional shape for the same type of object. However, the corresponding cutting strategy is set in advance according to the possible different placing angles of the object, namely, the angle from which the bounding box is cut. On the basis, the cutting strategy of the object to be detected can be realized according to the object outline shape of the object to be detected. And cutting the bounding box according to the cutting strategy. Thereby improving the accuracy of the cutting.

On the basis of the second possible implementation manner of the first aspect, in a sixth possible implementation manner of the first aspect, the result of the object identification further includes an object contour shape of the object to be detected.

Correspondingly, the processing of the first size information and the object type by using the pre-trained volume estimation model to obtain the volume measurement result includes:

and processing the first size information, the object outline shape and the object type by using the volume estimation model trained in advance to obtain a volume measurement result.

In the embodiment of the application, a volume estimation model for adaptively calculating the volume of the object according to the size of the bounding box, the type of the object and the outline shape of the object is constructed and trained in advance. Compared with manual enumeration, the neural network model can learn more types in a self-adaptive mode, and has stronger adaptability to irregular objects. The final volumetric measurement accuracy can be improved.

On the basis of any one of the second to sixth possible implementation manners of the first aspect, in a seventh possible implementation manner of the first aspect, the method further includes:

and responding to the first operation of the user, and executing the operation of acquiring the color image of the object to be measured until the volume measurement result of the food is obtained. And calculating nutrient component data and/or energy value data of the food according to the object type and the volume measurement result, and displaying and outputting and/or voice broadcasting the nutrient component data and/or the energy value data.

In embodiments of the present application, the corresponding nutrient data and/or energy values may be calculated based on visually measured food volumes. Therefore, the user does not need to touch food, manually look up tables and calculate data, and the effective management of the self dietary structure and the intake energy value can be realized.

On the basis of any one of the second to sixth possible implementation manners of the first aspect, in an eighth possible implementation manner of the first aspect, the method further includes:

and responding to the second operation of the user, and executing the operation of obtaining the color image of the object to be measured until an express volume measurement result is obtained. And calculating the freight price of the express delivery according to the volume measurement result.

In the embodiment of the application, the express delivery volume can be measured based on vision and the corresponding freight price can be calculated. Therefore, the method and the device can reduce the complexity of express price calculation operation, improve the calculation efficiency and improve the convenience of life of a user. Meanwhile, the worker is not required to contact the express delivery, and the safety of the worker can be guaranteed.

On the basis of any one of the second to sixth possible implementation manners of the first aspect, in a ninth possible implementation manner of the first aspect, the method further includes:

and executing the operation of obtaining the color image of the object to be measured until obtaining the volume measurement result of the commodity. The price of the commodity is calculated according to the volume measurement result.

In the embodiment of the application, the commodity volume can be measured based on vision and the corresponding price can be calculated. Therefore, the method and the device can reduce the complexity of commodity price calculation operation, improve the calculation efficiency and improve the convenience of life of users. Meanwhile, the safety of workers can be guaranteed without the need of contacting the commodities.

On the basis of any one of the second to sixth possible implementation manners of the first aspect, in a tenth possible implementation manner of the first aspect, the operation of calculating the volume of the cut bounding box includes:

and acquiring second size information of the cut bounding box, taking the second size information as the size information of the object to be measured, and calculating the volume of the cut bounding box according to the second size information.

Accordingly, the vision-based volumetric measurement method further comprises:

and responding to the third operation of the user, and executing the operation of acquiring the color image of the object to be measured until the volume measurement result and the second size information of the furniture are obtained. And responding to the fourth operation of the user, acquiring an image or a video of the preset space region, and displaying the furniture in an overlapped mode into the image or the video of the preset space region by utilizing the augmented reality technology according to the volume measurement result and the second size information.

In the embodiment of the application, after the user performs visual volume measurement on favorite furniture, three-dimensional object modeling can be performed according to the measured size and volume data. When the user needs to judge whether the furniture is appropriate, the user only needs to shoot or take a video aiming at the area of the furniture to be placed in the home. At this moment, the embodiment of the application can display furniture in an overlaid manner in the obtained image or video through the AR technology. The user can more intuitively see whether the size of the furniture meets the requirement. And meanwhile, whether the style of the furniture is proper or not can be judged. Therefore, the furniture layout planning method and the furniture layout planning device can greatly improve the efficiency of furniture layout design and scene layout planning.

On the basis of any one of the second to sixth possible implementation manners of the first aspect, in an eleventh possible implementation manner of the first aspect, the operation of calculating the volume of the cut bounding box, where the object to be measured is a human body, includes:

Accordingly, the vision-based volumetric measurement method further comprises:

and responding to the fifth operation of the user, and executing the operation of acquiring the color image of the object to be measured until the volume measurement result and the second size information of the human body are obtained. And performing virtual human body modeling according to the volume measurement result and the second size information to obtain a human body model corresponding to the human body, and creating a game role associated with the human body by using the human body model.

According to the embodiment of the application, the real stature of the user can be visually measured, and the game role can be automatically generated according to the stature condition of the user. Thereby making the match between the game character and the actual user higher. Therefore, the user can have stronger substituting feeling for the game role, and the game experience of the user is better.

A second aspect of embodiments of the present application provides a vision-based volumetric measurement device, comprising:

and the object identification module is used for acquiring a color image of the object to be detected and carrying out object identification on the color image to obtain the object type of the object to be detected.

The bounding box generating module is used for acquiring a depth image of the object to be detected, generating a first three-dimensional point cloud of the object to be detected according to the depth image and generating a bounding box of the first three-dimensional point cloud.

And the volume measurement module is used for cutting the bounding box according to the type of the object, calculating the volume of the cut bounding box, and taking the obtained volume as the volume measurement result of the object to be measured.

A third aspect of embodiments of the present application provides a terminal device, where the terminal device includes a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor executes the computer program to enable the terminal device to implement the steps of any one of the vision-based volume measurement methods in the first aspect.

A fourth aspect of an embodiment of the present application provides a computer-readable storage medium, including: there is stored a computer program, characterized in that the computer program, when being executed by a processor, causes a terminal device to carry out the steps of the vision based volume measurement method as defined in any one of the above-mentioned first aspects.

A fifth aspect of embodiments of the present application provides a computer program product, which, when run on a terminal device, causes the terminal device to perform the vision-based volume measurement method of any one of the first aspects described above.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Drawings

Fig. 1A is a schematic structural diagram of a mobile phone according to an embodiment of the present application;

fig. 1B is a block diagram of a software structure of a terminal device according to an embodiment of the present application;

FIG. 2A is a schematic flow chart of a vision-based volumetric measurement method according to an embodiment of the present application;

fig. 2B is a schematic diagram of an application scenario provided in an embodiment of the present application;

FIG. 2C is a schematic flow chart of a vision-based volumetric measurement method according to an embodiment of the present application;

fig. 2D is a schematic diagram of an application scenario provided in an embodiment of the present application;

fig. 2E is a schematic diagram of an application scenario provided in an embodiment of the present application;

FIG. 2F is a schematic flow chart of a vision-based volumetric measurement method according to an embodiment of the present application;

FIG. 2G is a schematic flow chart of a vision-based volumetric measurement method according to an embodiment of the present application;

fig. 2H is a schematic diagram of an application scenario provided in an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a vision-based volume measuring device according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

For the convenience of understanding, the embodiments of the present application will be briefly described herein:

in conventional vision-based volumetric measurement schemes, object recognition is performed first based on color images. And then, positioning the object of the depth image according to the recognition result. And finally, drawing a bounding box of the object according to the positioning result in the depth image, and taking the volume of the bounding box as the volume of the object. Although the volume measurement of the object can be realized to a certain extent, the analysis shows that at least the following defects exist:

1. when the object of the color image is identified, most of methods such as binary segmentation and the like are used for extracting pixel points of the object. However, in practical application, methods such as binary segmentation basically have the problem that object pixel points are lost during extraction. For example, for the object edge pixels close to the gray level of the surrounding environment, the pixels are easily classified as pixels in the environment and discarded. Therefore, when the depth image object is located based on the recognition result of the color image, the information in the depth image is lost. Further, the drawing of the bounding box is inaccurate, and the accuracy of the result of the volume measurement of the object is reduced.

2. The bounding boxes are more regular in spatial shape, such as the commonly used AABB bounding box, bounding ball, OBB bounding box, and FDH bounding box. For some irregular objects, such as bananas, humanoid robots, etc. The bounding volume will be larger than the actual volume of the object. Therefore, volume measurement based on bounding boxes inevitably results in low measurement accuracy.

In order to solve the above problem, the accuracy of the volume measurement of the object is improved. According to the embodiment of the application, the corresponding three-dimensional space shape can be constructed in advance according to the actual space shapes of various objects. For a humanoid robot, for example, a humanoid three-dimensional spatial shape comprising four limbs can be constructed. For bananas, a cylindrical three-dimensional space shape with a certain curvature can be constructed. On the basis of this, first, object recognition is performed based on a color image of the object, and the object type is obtained. And simultaneously, carrying out three-dimensional point cloud analysis on the object on the depth image, and drawing a bounding box corresponding to the object according to the obtained three-dimensional point cloud. And determining the actual three-dimensional space shape of the object according to the type of the object, and cutting the bounding box to change the shape of the cut bounding box into the actual three-dimensional space shape of the object. And finally, carrying out volume measurement on the cut bounding box and taking the volume measurement as an object volume measurement result. Because the three-dimensional point cloud of the object is extracted according to the depth image data, the positioning according to the color image is not needed. Therefore, the method is not influenced by the color image object recognition. And the drawing accuracy of the bounding box is improved. After the bounding box drawn at the same time is obtained, the bounding box is further cut according to the actual three-dimensional space shape of the object. And further, the finally obtained bounding box is combined with the object more closely, and the extraction of the object envelope information is more precise. Therefore, the volume calculation is carried out according to the cut bounding box, and the accuracy of the object volume measurement can be greatly improved.

The volume measurement method based on vision provided by the embodiment of the application can be applied to terminal equipment such as mobile phones, tablet computers and wearable equipment. At this time, the terminal device is an execution subject of the vision-based volume measurement method provided by the embodiment of the application. The embodiment of the present application does not set any limit to the specific type of the terminal device.

Hereinafter, taking the terminal device as a mobile phone as an example, fig. 1A shows a schematic structural diagram of the mobile phone 100.

The handset 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a key 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a SIM card interface 195, and the like. The sensor module 180 may include a gyroscope sensor 180A, an acceleration sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an ambient light sensor 180E, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, and a touch sensor 180K (of course, the mobile phone 100 may further include other sensors, such as a temperature sensor, a pressure sensor, a distance sensor, an air pressure sensor, a bone conduction sensor, and the like, which are not shown in the figure).

It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the mobile phone 100. In other embodiments of the present application, the handset 100 may include more or fewer components than shown, or some components may be combined, some components may be separated, or a different arrangement of components may be used. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a Neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors. The controller may be a neural center and a command center of the cell phone 100, among others. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

The processor 110 may operate the vision-based volume measurement method provided in the embodiment of the present application, so as to improve the accuracy of the volume measurement of the object and improve the experience of the user. The processor 110 may include different devices, such as an integrated CPU and a GPU, and the CPU and the GPU may cooperate to execute the vision-based volume measurement method provided in the embodiment of the present application, for example, part of the algorithm in the vision-based volume measurement method is executed by the CPU, and another part of the algorithm is executed by the GPU, so as to obtain faster processing efficiency.

The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the cell phone 100 may include 1 or N display screens 194, with N being a positive integer greater than 1. The display screen 194 may be used to display information input by or provided to the user as well as various Graphical User Interfaces (GUIs). For example, the display 194 may display a photograph, video, web page, or file, etc. As another example, display 194 may display a graphical user interface. Wherein the graphical user interface includes a status bar, a concealable navigation bar, a time and weather widget, and an icon of an application, such as a browser icon. The status bar includes the name of the operator (e.g., china mobile), the mobile network (e.g., 4G), the time and the remaining power. The navigation bar includes a back key icon, a home key icon, and a forward key icon. Further, it is understood that in some embodiments, a Bluetooth icon, a Wi-Fi icon, an add-on icon, etc. may also be included in the status bar. It will also be appreciated that in other embodiments, a Dock bar may also be included in the graphical user interface, and that a commonly used application icon may be included in the Dock bar, etc. When the processor detects a touch event of a finger (or stylus, etc.) of a user with respect to an application icon, in response to the touch event, the user interface of the application corresponding to the application icon is opened and displayed on the display 194.

In this embodiment, the display screen 194 may be an integrated flexible display screen, or may be a spliced display screen formed by two rigid screens and a flexible screen located between the two rigid screens. After the processor 110 executes the vision-based volume measurement method provided by the embodiment of the present application, the processor 110 may control an external audio output device to switch the output audio signal.

The mobile phone 100 may implement a shooting function through the ISP, the camera 193, the video codec, the GPU, the display 194, the application processor, and the like.

The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.

The cameras 193 (front camera or rear camera, or one camera may be both front camera and rear camera) are used to capture still images or video. In general, the camera 193 may include a photosensitive element such as a lens group including a plurality of lenses (convex lenses or concave lenses) for collecting an optical signal reflected by an object to be photographed and transferring the collected optical signal to an image sensor, and an image sensor. And the image sensor generates an original image of the object to be shot according to the optical signal. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, the handset 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the handset 100 is in frequency bin selection, the digital signal processor is used to perform fourier transform or the like on the frequency bin energy.

Video codecs are used to compress or decompress digital video. Handset 100 may support one or more video codecs. Thus, the handset 100 can play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the cellular phone 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. Wherein the storage program area may store an operating system, codes of application programs (such as a camera application, a WeChat application, etc.), and the like. The data storage area can store data created during the use of the mobile phone 100 (such as images, videos and the like acquired by a camera application), and the like.

The internal memory 121 may also store one or more computer programs 1310 corresponding to the vision-based volume measurement method provided by the embodiments of the present application. The one or more computer programs 1304 are stored in the memory 121 and configured to be executed by the one or more processors 110, the one or more computer programs 1310 including instructions that can be used to perform the steps as in the respective embodiments of fig. 2A-2H, the computer programs 1310 can include an account number verification module 2111 and a priority comparison module 2112. The account verification module 2111 is configured to authenticate system authentication accounts of other terminal devices in the local area network; the priority comparison module 2112 may be configured to compare the priority of the audio output request service with the priority of the current output service of the audio output device. The state synchronization module 2113 may be configured to synchronize the device state of the audio output device currently accessed by the terminal device to another terminal device, or synchronize the device state of the audio output device currently accessed by another device to the local. When the code of the vision-based volume measurement method stored in the internal memory 121 is executed by the processor 110, the processor 110 may control the transmitting end to perform the processing of the color image and the depth image.

In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.

Of course, the code of the vision-based volume measurement method provided by the embodiment of the present application may also be stored in the external memory. In this case, the processor 110 may execute the code of the vision-based volume measurement method stored in the external memory through the external memory interface 120, and the processor 110 may control the transmitting end to perform color image and depth image processing.

The function of the sensor module 180 is described below.

The gyro sensor 180A may be used to determine the motion attitude of the cellular phone 100. In some embodiments, the angular velocity of the handpiece 100 about three axes (i.e., the x, y, and z axes) may be determined by the gyro sensor 180A. I.e., the gyro sensor 180A may be used to detect the current state of motion of the handset 100, such as shaking or standing still.

When the display screen in the embodiment of the present application is a foldable screen, the gyro sensor 180A may be used to detect a folding or unfolding operation acting on the display screen 194. The gyro sensor 180A may report the detected folding operation or unfolding operation as an event to the processor 110 to determine the folded state or unfolded state of the display screen 194.

The acceleration sensor 180B can detect the magnitude of acceleration of the cellular phone 100 in various directions (typically three axes). I.e., the gyro sensor 180A may be used to detect the current state of motion of the handset 100, such as shaking or standing still. When the display screen in the embodiment of the present application is a foldable screen, the acceleration sensor 180B may be used to detect a folding or unfolding operation acting on the display screen 194. The acceleration sensor 180B may report the detected folding operation or unfolding operation as an event to the processor 110 to determine the folded state or unfolded state of the display screen 194.

The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The mobile phone emits infrared light outwards through the light emitting diode. The handset uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the handset. When insufficient reflected light is detected, the handset can determine that there are no objects near the handset. When the display screen in this embodiment of the application is a foldable screen, the proximity optical sensor 180G may be disposed on the first screen of the foldable display screen 194, and the proximity optical sensor 180G may detect a folding angle or an unfolding angle of the first screen and the second screen according to an optical path difference of the infrared signal.

The gyro sensor 180A (or the acceleration sensor 180B) may transmit the detected motion state information (such as an angular velocity) to the processor 110. The processor 110 determines whether the mobile phone is currently in the hand-held state or the tripod state (for example, when the angular velocity is not 0, it indicates that the mobile phone 100 is in the hand-held state) based on the motion state information.

The fingerprint sensor 180H is used to collect a fingerprint. The mobile phone 100 can utilize the collected fingerprint characteristics to unlock the fingerprint, access the application lock, take a photograph of the fingerprint, answer an incoming call with the fingerprint, and the like.

The touch sensor 180K is also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation applied thereto or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may be disposed on the surface of the mobile phone 100, different from the position of the display 194.

Illustratively, the display screen 194 of the handset 100 displays a main interface that includes icons for a plurality of applications (e.g., a camera application, a WeChat application, etc.). The user clicks the icon of the camera application in the home interface through the touch sensor 180K, which triggers the processor 110 to start the camera application and open the camera 193. The display screen 194 displays an interface, such as a viewfinder interface, for the camera application.

The wireless communication function of the mobile phone 100 can be realized by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, the baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the handset 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution including wireless communication of 2G/3G/4G/5G, etc. applied to the handset 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110. In the embodiment of the present application, the mobile communication module 150 may also be used for information interaction with other terminal devices.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.) or displays an image or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional modules, independent of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication applied to the mobile phone 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), Bluetooth (BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves. In this embodiment, the wireless communication module 160 may be used to access the access point device, send a message to another terminal device, or receive a message corresponding to an audio output request sent by another terminal device.

In addition, the mobile phone 100 can implement an audio function through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playing, recording, etc. The handset 100 may receive key 190 inputs, generating key signal inputs relating to user settings and function controls of the handset 100. The handset 100 can generate a vibration alert (e.g., an incoming call vibration alert) using the motor 191. The indicator 192 in the mobile phone 100 may be an indicator light, and may be used to indicate a charging status, a power change, or a message, a missed call, a notification, etc. The SIM card interface 195 in the handset 100 is used to connect a SIM card. The SIM card can be attached to and detached from the cellular phone 100 by being inserted into the SIM card interface 195 or being pulled out from the SIM card interface 195.

It should be understood that in practical applications, the mobile phone 100 may include more or less components than those shown in fig. 1A, and the embodiment of the present application is not limited thereto. The illustrated handset 100 is merely an example, and the handset 100 may have more or fewer components than shown in the figures, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

The software system of the terminal device may adopt a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. The embodiment of the invention takes an Android system with a layered architecture as an example, and exemplarily illustrates a software structure of a terminal device. Fig. 1B is a block diagram of a software configuration of a terminal device according to an embodiment of the present invention.

The layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, an application layer, an application framework layer, an Android runtime (Android runtime) and system library, and a kernel layer from top to bottom.

The application layer may include a series of application packages.

As shown in fig. 1B, the application package may include applications such as phone, camera, gallery, calendar, phone call, map, navigation, WLAN, bluetooth, music, video, short message, etc.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions.

As shown in FIG. 1B, the application framework layers may include a window manager, content provider, view system, phone manager, resource manager, notification manager, and the like.

The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.

The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.

The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

The telephone manager is used for providing a communication function of the terminal equipment. Such as management of call status (including on, off, etc.).

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.

The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to inform download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, text information is prompted in the status bar, a prompt tone is given, the terminal device vibrates, an indicator light flickers, and the like.

The Android Runtime comprises a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system.

The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface managers (surface managers), Media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., OpenGL ES), 2D graphics engines (e.g., SGL), and the like.

The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.

The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as: MPEG4, H.164, MP3, AAC, AMR, JPG, PNG, etc.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.

In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.

Fig. 2A shows a flowchart of an implementation of a vision-based volume measurement method provided in an embodiment of the present application, which is detailed as follows:

and S2011, acquiring a color image and a depth image of the object to be detected, and performing object identification on the object to be detected according to the color image to obtain the object type of the object to be detected.

The method and the device for acquiring the color image and the depth image do not limit the acquisition ways of the color image and the depth image too much, and can determine the acquisition ways according to the actual scene. For example, in some alternative embodiments, the terminal device as the execution subject may capture color images and depth images through a color camera and a depth camera carried by the terminal device. At the moment, the real-time measurement of the volume of the object to be measured can be realized. In other alternative embodiments, the stored color image and depth image may be read and processed as described in the embodiments of the present application. At the moment, real-time or non-real-time volume measurement of the object to be measured can be realized according to the storage time of the color image and the depth image. If the storage time is relatively far from the current time, a non-real-time volume measurement of the object is carried out. In still other alternative embodiments, the color image and the depth image may also be acquired by other devices and then sent to the terminal device. At this time, acquiring the color image and the depth image in S101 means receiving the color image and the depth image. The manner of acquiring the image by the other device is not limited, and the image may be captured by the other device or may be read or received. The concrete needs to be determined by actual scenes.

On the other hand, the color image and the depth image in the embodiment of the present application may be taken photographs, or may be image frames extracted from recorded videos. The concrete method can be determined according to actual scenes. When the picture is taken, the picture can be obtained through the above-mentioned obtaining route. When the image frame is an image frame, the operations on the color image and the depth image in the above-mentioned acquisition path need to be modified into: and extracting image frames from the video shot by the color camera to obtain color images, and extracting image frames from the video shot by the depth camera to obtain depth images. In this case, the video may be shot by the terminal device itself, may be shot by another device, or may be stored in advance. The concrete needs to be determined by actual scenes.

The capturing time of the color image and the depth image at the same time is explained as follows:

in the embodiment of the application, the color image and the depth image are used for volume measurement of the same object. In consideration of the fact that the object may move over time in an actual situation, and the angle at which the object is photographed or the like may also vary. Therefore, in order to ensure the accuracy of the object volume measurement, the optimal selection of the capturing time of the color image and the depth image is the same time. On the other hand, considering that the moving probability of the object and the change probability of the photographing angle are generally small in a short time, the corresponding amount of movement and the amount of angle change are extremely small and may be 0. Therefore, in practical applications, a time difference threshold may also be set, for example, 0 to 100 ms. When the shooting time difference of the color image and the depth image is smaller than the time difference threshold value, the method is considered to be used for object volume measurement. At this time, the object recognition operation in S101 may be performed. Image frames corresponding to when extracted based on video taken by a color camera and a depth camera are taken as scenes of color images and depth images in the embodiments of the present application. The corresponding shooting time difference between the two extracted image frames should also be smaller than the time difference threshold.

After obtaining the color image and the depth image, the embodiment of the application performs object recognition on the color image to determine the actual type of the object. The classification rule of the object type is not limited in the embodiment of the application, and the classification rule can be set by technical personnel according to actual requirements. For example, in some alternative embodiments, each of the different objects may be considered a separate class of objects. Such as peaches, bananas, robots, mobile phones and tablet computers, can serve as 5 different object categories. The object identification at this time needs to determine what the object to be detected is. In other alternative embodiments, the objects may be classified according to their three-dimensional shapes, and the objects with the same or similar three-dimensional shapes may be classified as the same object. For example, consider that the mobile phone and tablet computer are rectangular in shape, and can be classified as a first class object. And the objects such as apples and peaches which are all sphere-like objects can be classified as a second class of objects. On the basis, the object identification is to identify the category to which the object to be detected belongs, such as the first type object or the second type object, or other types of objects. At this time, what the object is may not be recognized, for example, for a peach, it may not be recognized that it is a peach, and only it needs to be recognized that it is a second kind of object. Meanwhile, in the actual life, the types of objects are very many, and the types of objects which are generally required to be subjected to object volume identification are limited. Therefore, when the objects are classified in advance and the subsequent three-dimensional space shape is set, technicians can select a part of the objects to set according to actual requirements. The type and number of the objects are not limited and can be set by the technician.

Meanwhile, the specific object identification method is not limited too much in the embodiment of the application, and technicians can select or set the object identification method according to actual requirements. For example, in some alternative embodiments, some deep learning-based object detection algorithms may be selected to implement object recognition in the embodiments of the present application. In other alternative embodiments, images of different kinds of objects may be preset, and corresponding kind labels may be set. When object recognition is performed, matching of object images can be performed, and the type of the image with the highest similarity is used as the object type of the object to be detected in the embodiment of the application. Or the object recognition and the image segmentation of the object to be detected can be realized by adopting a binary segmentation method.

The following object type classification rules are: each different object is taken as a separate object, and the subsequent operation explanation is performed.

S202, generating a first three-dimensional point cloud of the object to be detected according to the depth image, and generating a bounding box of the first three-dimensional point cloud.

The three-dimensional point cloud records information of an object in a point form after the object is scanned, wherein each point has a three-dimensional space coordinate. In the embodiment of the present application, scanning an object refers to analyzing a depth image of an object to be measured.

In the embodiment of the application, the depth image is taken as an object to be processed to generate three-dimensional point cloud data of the object to be measured, so as to obtain coordinate data of the object to be measured in a three-dimensional space. Wherein the object to be measured may be more than the object to be measured in view of the content contained in the depth image. If the object to be measured is placed on a table, there will also be a table top within the depth image. Therefore, when generating the first three-dimensional point cloud of the object to be measured, the object to be measured needs to be distinguished from other contents in the depth image. As two optional processing modes, the method comprises the following steps:

1. firstly, object positioning is carried out on the depth image, and then three-dimensional point cloud generation is carried out on the positioned image area where the object to be detected is located.

2. Firstly, three-dimensional point clouds of all contents of the depth image are generated, then the object to be detected is positioned, and the three-dimensional point clouds corresponding to the image area where the object to be detected is located are screened out.

The method for positioning the object to be measured is not limited herein, and can be set by a technician according to actual requirements. For example, some object recognition algorithms can be used to realize object recognition, and then the image area where the object to be detected is located is searched according to the recognition result.

After the first three-dimensional point cloud of the object to be detected is screened out, the embodiment of the application can generate the bounding box for the first three-dimensional point cloud, and all the points in the first three-dimensional point cloud are placed in the bounding box. The type and generation method of the bounding box are not limited herein, and can be set by a technician according to actual requirements. For example, in some alternative embodiments, a rectangular bounding box may be chosen as the bounding box in embodiments of the present application. Meanwhile, the generation method can be set to be that each surface of the rectangular bounding box is tangent to the edge of the three-dimensional point cloud. At this time, the bounding box generated in S202 is the minimum rectangular bounding box.

As an alternative embodiment of the present application, after the bounding box is generated, the bounding box may be rendered into a color image and displayed. So that the user can see the bounding box situation corresponding to the object in the actual color image. For example, reference may be made to fig. 2B, in which the object to be measured is a peach, and (a) part is an acquired color image, and (B) part is an image after the generated bounding box is displayed on the color image.

As an optional implementation manner for generating the first three-dimensional point cloud of the object to be detected in the present application, on the basis of the processing manner 2, in the embodiment of the present application;

s2011, when identifying an object, the identification result of the object further includes: a first pixel point of the object to be tested.

Referring to fig. 2C, the operation of S2011 may be replaced by:

s2012, a color image and a depth image of the object to be detected are obtained, and the object to be detected is identified according to the color image, so that the object type and the first pixel point of the object to be detected are obtained.

Correspondingly, the generation process S202 of the first three-dimensional point cloud may be replaced by:

s2021, obtaining a second pixel point in a preset range of the object to be detected in the color image, and obtaining a pixel point set formed by the first pixel point and the second pixel point.

S2022, generating a second three-dimensional point cloud corresponding to the depth image, and screening a third three-dimensional point cloud corresponding to the pixel point set from the second three-dimensional point cloud.

S2023, screening the first three-dimensional point cloud corresponding to the object to be detected from the third three-dimensional point cloud.

Considering that the depth image may contain more contents, if the whole image is searched to locate the object to be detected, the workload is large. Especially, for some depth images which contain more pixel points and have less pixel points corresponding to the object to be detected, the workload is large, and the positioning accuracy of the object to be detected is also reduced. In order to improve the efficiency of object positioning, the embodiment of the present application may use the recognition result of the object in the color image to assist the object positioning of the depth image.

Specifically, when the color image is subjected to object identification, a plurality of corresponding pixel points, namely first pixel points, of the object in the color image can be determined. Because the depth image and the color image are both images obtained after the object to be detected is shot, the shot content coincidence is high. On the basis, the contents of the two images can be aligned, and then the corresponding three-dimensional point cloud in the depth image is screened out according to the determined pixel points in the color image. The rapid positioning of the object to be detected can be realized, and the object positioning efficiency is improved.

On the other hand, when the color image is subjected to object identification and pixel point extraction, the situation that the pixel points at the edge of the object are lost may occur. Therefore, if the object in the depth image is rapidly located and the three-dimensional point cloud is screened according to the first pixel point identified and determined by the object in the color image, a part of points of the object may be lost. Leading to inaccurate screened three-dimensional point cloud. For example, referring to fig. 2D, in the embodiment of the present application, the object to be detected is a peach, part (a) is a color image, and part (b) is an area image formed by first pixel points of the peach and obtained by performing object identification and binary segmentation. As can be seen from (b), the edge of the peach loses part of the pixel points.

Therefore, after the first pixel point is obtained, a part of pixel points can be expanded to the periphery of the object to be detected on the basis of the first pixel point. And then realize the extension to the object edge, prevent that edge pixel from losing. The specific extended preset range is not limited herein, and can be set by a technician according to actual requirements. For example, in some alternative embodiments, 20 pixels may be set. At this time, in S2021, the edge pixel points of the object to be detected are extended outward by 20 pixel points, and the extended covered second pixel points and the first pixel points identified by the object are used as a pixel point set. For example, referring to fig. 2E, in the embodiment of the present application, the object to be detected is a peach, and the part (b) is an area image formed by first pixel points of the peach, which is obtained by performing object identification and binary segmentation. (c) And the part is based on the regional image which is obtained after the pixel expansion of the part (b) and is formed by the first pixel and the second pixel together. As can be seen from (c), the edge pixels of the extended peach are all preserved. But also introduces some ambient pixels.

When the object pixel points are segmented by adopting a binary segmentation method, the first pixel points and the second pixel points in the embodiment of the application are pixel points containing gray information after gray binarization processing.

As an optional embodiment of the present application, when performing the extraction and expansion of the pixel points of the object to be detected and the three-dimensional point cloud screening in S2012, S2021 and S2022, an image Mask (Mask) may be used to complete the extraction and expansion.

Correspondingly, when the object is identified in S2011, the identification result of the object further includes: the mask of the object to be detected, wherein values corresponding to a first pixel point corresponding to the object to be detected in the mask are all 1, and pixel point values except the first pixel point are all 0.

The operation of S2012 at this time may then be replaced with:

and acquiring a color image and a depth image of the object to be detected, and performing object identification on the object to be detected according to the color image to obtain the object type and the mask of the object to be detected.

Correspondingly, the generation processes S2021 and S2022 for the first three-dimensional point cloud may be replaced by:

and acquiring a second pixel point within a preset range of the mask in the color image.

And generating a second three-dimensional point cloud corresponding to the depth image, and screening a third three-dimensional point cloud corresponding to the mask and the second pixel point from the second three-dimensional point cloud.

Wherein the mask is a binary image consisting of 0 and 1. In the embodiment of the application, the mask of the object to be detected can be obtained by setting the pixel points corresponding to the object to be detected in the color image to be 1 values and setting other pixel points to be 0 values. When the mask is processed, only the 1-value pixel points are processed. Therefore, the mask can be used for distinguishing the object to be detected from other contents of the object not to be detected in the color image during subsequent processing. And simultaneously, carrying out edge expansion on the mask, and carrying out three-dimensional point cloud screening according to the range of the mask and the expanded range. The method can effectively avoid losing the edge pixels of the object to be detected during object identification.

After the pixel point set is obtained, in the embodiment of the present application, a point set corresponding to all pixel points in the pixel point set is screened out from the second three-dimensional point cloud generated according to all contents of the depth image, and then a corresponding third three-dimensional point cloud is obtained. And the third three-dimensional point cloud comprises points of the newly added environment after the edge expansion of the object to be detected. Therefore, the third three-dimensional point cloud needs to be screened for the first three-dimensional point cloud corresponding to the object to be detected, so that the object to be detected is accurately positioned. The screening method of the third three-dimensional point cloud is not limited here, and can be selected or set by a technician according to actual needs. For example, some three-dimensional segmentation models may be used to segment the third three-dimensional point cloud to obtain the first three-dimensional point cloud remaining after segmentation. Or object identification can be carried out in the third three-dimensional point cloud area, and then the three-dimensional point cloud of the object to be detected is screened out.

It should be noted that the image is composed of pixel points. Therefore, in the embodiment shown in fig. 2C, the operation of extracting the first pixel point may also be regarded as extracting the area image of the object to be measured in the color image. At this time, the region image composed of the first pixel points is obtained. Similarly, the extension of the second pixel point is to extend the edge range of the region image based on the region image formed by the first pixel point. And then obtaining the regional image which contains all the first pixel points and the second pixel points after expansion. That is, the expanded region image is composed of the pixels in the pixel set. Reference may be made, for example, to the embodiment shown in fig. 2D and the embodiment shown in fig. 2E.

In the embodiment of the application, pixel point expansion is carried out through an object identification result based on a color image, and then a three-dimensional point cloud area corresponding to an object is found out according to the expanded pixel points. And finally, screening the three-dimensional point cloud of the object to be detected in the searched three-dimensional point cloud area, thereby realizing the rapid searching and positioning of the three-dimensional point cloud of the object.

S2031, obtaining the three-dimensional space shape of the object to be detected according to the object type, and cutting the bounding box according to the three-dimensional space shape. And calculating the volume of the bounding box after cutting, and taking the obtained volume as the volume measurement result of the object to be measured.

Due to the limited types of bounding boxes, it is difficult to adapt to the real shape of different objects in the real world. Therefore, when the objects are classified in advance, corresponding three-dimensional space shape data can be set for each class of objects. For a humanoid robot, for example, a humanoid three-dimensional spatial shape comprising four limbs can be constructed. For bananas, a cylindrical three-dimensional space shape with a certain curvature can be constructed. On this basis, after the bounding box of the object to be detected is generated, the corresponding three-dimensional space shape can be searched according to the type of the object to be detected in the embodiment of the application.

And after the three-dimensional space shape of the object to be detected is obtained, cutting the bounding box according to the three-dimensional space shape, namely cutting the bounding box into the three-dimensional space shape of the object to be detected. The bounding box obtained after cutting is combined with the first three-dimensional point cloud of the object to be measured more closely. Meanwhile, because the space coordinates of each point in the first three-dimensional point cloud are known data, the volume of the cut bounding box is calculated according to the coordinates. Therefore, the volume of the cut bounding box can be further calculated and used as the volume measurement result of the object to be measured. In some optional embodiments, when performing cutting, the cutting principle may be set as: under the condition of meeting the three-dimensional space shape of the object to be detected, the points in the three-dimensional point cloud are reserved as far as possible. Therefore, the finally cut bounding box can meet the requirement of the three-dimensional space shape, the loss of the three-dimensional point cloud of the object caused by cutting is reduced as much as possible, the cutting accuracy is improved, and the accuracy of volume measurement is guaranteed.

It should be noted that, when the three-dimensional point cloud is generated through the depth image, the obtained three-dimensional coordinates of each point are all three-dimensional coordinates in the image coordinate system of the shooting device. These three-dimensional coordinates cannot be used to calculate the true volumetric data of the object. Therefore, the three-dimensional coordinates in the image coordinate system need to be converted into the three-dimensional coordinates in the world coordinate system, so as to realize the subsequent calculation of the object volume. This coordinate transformation may occur at any step after the depth image three-dimensional point cloud is acquired, for example, in S202 or S2031, or between or after two steps. The method of converting the coordinates is not limited herein, and can be selected or set by a technician according to actual requirements. For example, the calibration parameters of the depth camera and the color camera can be unified according to the calibration information of the shooting device, so as to obtain the internal reference matrix of the shooting device. And converting the three-dimensional point cloud in the depth image coordinate system into the three-dimensional point cloud in the color image coordinate system. And simultaneously, acquiring the rotation angle of the color camera relative to gravity by combining the attitude data of the shooting equipment, such as a magnetometer, an acceleration sensor and a gyroscope, so as to acquire the external reference matrix of the color camera relative to a world coordinate system. And then, realizing three-dimensional coordinate conversion of the three-dimensional point cloud world coordinate system according to the internal reference matrix and the external reference matrix. When the terminal device itself shoots the color image and the depth image, the shooting device is the terminal device itself.

As an alternative embodiment of the present application, it is considered that the spatial shape of the same object is still assumed to be the same for the same object. When the placing postures of the three-dimensional space are different, the placing angles of the corresponding three-dimensional space shapes can be different. For example for a cylindrical object. When the bottom surface of the table is placed downwards on the table top and the curved surface is transversely placed on the table top, the three-dimensional space shape is cylindrical. But the difference lies in that the placing angles of the three-dimensional space shapes also have certain difference. On the basis, if only one three-dimensional space shape arranged at an angle is set for one type of object. When carrying out the bounding box cutting, then can't satisfy with the object that awaits measuring actually put the angle and shoot the different condition of angle under, to the accurate cutting of the object bounding box that awaits measuring. Therefore, the volume measurement precision of the object to be measured is difficult to be effectively guaranteed.

Illustrated as an example. For example, assume that the object a to be measured is a cylinder and the height is greater than the diameter of the bottom surface, such as a vacuum cup. Meanwhile, the adopted bounding box is the minimum rectangular bounding box, and the generated bounding box is a cuboid. On the basis, if the object A to be measured is only provided with a cylinder with a downward three-dimensional space shape. When the bounding box is cut, the bounding box can only be cut into a cylinder with the bottom surface facing downwards. If the object A to be measured is placed with the bottom surface facing downwards, the obtained bounding box after cutting is accurate. However, if the object a to be measured is not placed with its bottom surface facing downward, it is placed horizontally with its side surface facing downward. The situation difference between the cut bounding box and the actual object a to be measured is large. The calculated volume difference will be large.

In order to improve the volume measurement accuracy, in this embodiment of the application, when S2011 performs object identification, the identification result of the object further includes: the object profile shape of the object to be measured.

Referring to fig. 2F and 2G, the operation of S2011 may be replaced with:

s2013, obtaining the color image and the depth image of the object to be detected, and performing object identification on the object to be detected according to the color image to obtain the object type and the object outline shape of the object to be detected.

The object outline shape is the shape of the object to be detected in the color image and formed by the edge pixel points of the object to be detected. Alternatively, the shape may be the shape of the area image formed by the first pixel points of the object. (when a mask is used to perform an operation, the corresponding object contour shape is an edge contour shape of a region formed by 1-valued pixels in the mask, or an overall shape formed by 1-valued pixels in the mask), for example, refer to fig. 2H, where a gray region is a region image of the object to be measured after performing pixel graying processing. (a) In part, the edge pixel points of the object to be detected are thickened. In the embodiment of the present application, the shape formed by the pixel points at the edge of the object to be measured, which is thickened in the part (a), may be used as the object contour shape. The object outline shape may also be a figure formed by the entire gray area in the part (b). The specific setting can be set by technicians according to actual requirements.

Accordingly, referring to fig. 2F, the operation of S2031 may be replaced with:

s2032, obtaining the three-dimensional space shape of the object to be measured according to the object type and the object outline shape, and cutting the bounding box according to the three-dimensional space shape. And calculating the volume of the bounding box after cutting, and taking the obtained volume as the volume measurement result of the object to be measured.

Alternatively, referring to fig. 2G, it may be replaced with:

s2033, the three-dimensional space shape of the object to be measured is obtained according to the object type, and the cutting strategy is determined according to the object outline shape. And cutting the bounding box according to a cutting strategy until the shape of the cut bounding box is a three-dimensional space shape. And calculating the volume of the bounding box after cutting, and taking the obtained volume as the volume measurement result of the object to be measured.

For an object, it is assumed that its object space shape is unchanged. However, when the camera shoots an object at different shooting angles and/or different placement postures, the obtained outline shape of the object may be different (refer to the principle that the projection pattern of the object is different when the object is irradiated by the light source at different angles). Therefore, the angle of the object placed with respect to the photographing apparatus when the object is actually photographed can be theoretically recognized by the shape of the object in the color image. On this basis, this application embodiment can confirm the posture of putting of object according to the object profile shape of the object that awaits measuring, and then determines the cutting angle to the bounding box. Specifically, two optional implementation manners are provided in the embodiment of the present application:

1. refer to S2032. In the present embodiment 1, when a three-dimensional shape is set for each type of object, a plurality of three-dimensional shapes are set for the same type of object. The three-dimensional space shapes are substantially the same in shape, but different in angle of arrangement. For example, for a three-dimensional space shaped as a cylinder, a plurality of cylinders with different laying angles can be provided, such as a cylinder with a bottom face facing downward, a curved face facing downward, and an inclined plane with an inclination angle of 45 °. Meanwhile, the corresponding relation between the three-dimensional space shape and the object outline shape is set. On the basis, the three-dimensional space shape actually corresponding to the object to be detected is screened out according to the object type and the object outline shape. And finally, cutting and calculating the volume of the bounding box according to the screened three-dimensional space shape.

2. Refer to S2033. In the present embodiment 2, a three-dimensional shape is set for the same type of object. However, the corresponding cutting strategy is set in advance according to the possible different placing angles of the object, namely, the angle from which the bounding box is cut. For example, for a three-dimensional space shaped as a cylinder, with the bottom surface facing vertically downward, the curved surface facing downward and obliquely at an oblique angle of 45 °, three different cutting strategies can be provided. If the bottom surface of the bounding box is arranged to be downwards vertical, the bottom surface of the bounding box can be used as the bottom surface of the cylinder for cutting, so that the cylinder with the bottom surface downwards vertical can be obtained after cutting. When the curved surface is placed downwards, the bottom surface of the bounding box can be used as a surface tangent to the curved surface of the cylinder, and the front surface and the rear surface of the bounding box can be used as the upper bottom surface and the lower bottom surface of the cylinder for cutting. So that a cylinder with a curved surface facing downwards can be obtained after cutting. And then, setting the corresponding relation between the object outline shape and the cutting strategy according to the relation between the object outline shape and the object placing angle. On the basis, the cutting strategy of the object to be detected can be realized according to the object outline shape of the object to be detected. And cutting the bounding box according to the cutting strategy until the three-dimensional space shape corresponding to the object to be measured is obtained by cutting.

As an alternative embodiment of the present application, the embodiment shown in fig. 2C may be applied in combination with the embodiment shown in fig. 2F or fig. 2G. At this time, the identification result of the object includes the object type, the first pixel point and the object contour shape of the object to be detected.

Correspondingly, the operation of S2011 may be replaced with:

s2014, obtaining a color image and a depth image of the object to be detected, and performing object identification on the object to be detected according to the color image to obtain the object type, the first pixel point and the object outline shape of the object to be detected.

The region image corresponding to the object to be detected in the color image is composed of the first pixel points. Therefore, the first pixel point of the object to be detected can be obtained instead of obtaining the regional image of the object to be detected in the color image.

In the embodiment of the application, the corresponding three-dimensional space shape is constructed in advance according to the actual space shape and the possible pose of various objects. On the basis, firstly, object identification is carried out according to the color image of the object, and the type of the object, pixel points in the color image and the outline shape of the object are obtained. And simultaneously, three-dimensional point cloud analysis is carried out on the depth image. And carrying out pixel point range expansion according to pixel points of the object to be detected in the color image, screening the three-dimensional point cloud of the object to be detected on the depth image according to the expanded pixel point range, and drawing a bounding box corresponding to the object according to the obtained three-dimensional point cloud. And then, determining the actual three-dimensional space shape of the object to be detected according to the type of the object and the outline shape of the object, and cutting the bounding box according to the actual three-dimensional space shape of the object to be detected, so that the shape of the cut bounding box is changed into the actual three-dimensional space shape of the object. And finally, carrying out volume measurement on the cut bounding box and taking the volume measurement as an object volume measurement result.

Because the three-dimensional point cloud of the object is extracted according to the depth image data when being extracted, the pixels of the object to be detected in the color image are expanded when the object positioning and the three-dimensional point cloud screening are carried out based on the color image. Therefore, the influence of pixel point loss during color image object recognition is effectively prevented, and the drawing accuracy of the bounding box is further improved. And after the drawn bounding boxes are obtained at the same time, the corresponding three-dimensional space shape can be further searched according to the actual type and the outline graph of the object. The real space shape of the object to be detected under various different placing postures can be identified. Finally, the bounding box is cut according to the real space shape. And further, the finally obtained bounding box is combined with the object more closely, and the extraction of the object envelope information is more precise. Therefore, the volume calculation is carried out according to the cut bounding box, and the accuracy of the object volume measurement can be greatly improved.

In addition, different positions of the object in the color image are finally reflected due to different shooting angles of the object. The principle is that the camera shoots and images an object at different angles. Therefore, the embodiment of the application can support various different shooting angles of the object, is not limited by the shooting angle of the terminal equipment, and keeps high-precision volume measurement. The method and the device can be suitable for more different types of terminal equipment and different shooting environments or scenes. For example, it can be adapted to a hand-held terminal device, such as a mobile phone. The handheld terminal device is utilized by a user to photograph and volume recognize objects from various poses or angles. Meanwhile, the difficulty of user operation is greatly reduced, and the user does not need to find a proper posture with a proper angle to shoot, so that the object can be identified in volume. Therefore, the method and the device have stronger compatibility with different terminal devices and application scenes and have higher application value.

In addition, as can be seen from the above analysis of the embodiments of the present application, the embodiments of the present application only need one color image and one depth image at least to realize the volume recognition of the object. Therefore, the difficulty of data acquisition for identifying the object volume is reduced. The image can be shot and processed simultaneously by the same device or different devices. The identification of the object volume is more flexible in the embodiment of the application. For example, some terminal devices with color cameras and depth cameras and high processing power. The color image and the depth image of the object can be shot by self, and the volume measurement operation in the embodiment of the application can be carried out. The method is suitable for some terminal equipment which has a color camera and a depth camera and has weak processing capability, or some terminal equipment which has enough processing capability and does not need to consume excessive resources for volume measurement. It can be taken only as a photographing device for color images and depth images. After shooting, the shot data is sent to the execution main body terminal equipment of the embodiment of the application for processing through a wired or wireless data transmission mode. At the moment, the requirements on software and hardware of the shooting equipment can be reduced as much as possible, and the influence on the performance of the shooting equipment is reduced. The method and the device enable some users holding old model terminal equipment to use the embodiment of the application to realize the measurement of the volume of the object. In this case, the terminal device as the execution subject may not need to be configured with the camera hardware and the corresponding software. Greatly reducing the cost of software and hardware.

Several explanations regarding the embodiment shown in fig. 2A, the embodiment shown in fig. 2C, the embodiment shown in fig. 2F, and the embodiment shown in fig. 2G:

firstly, object identification of a color image, acquisition of pixel points corresponding to an object (segmentation of a region image corresponding to the object), screening of three-dimensional point cloud corresponding to the object, and cutting and volume calculation of a bounding box can be realized by a depth learning model which is trained in advance.

1. And identifying the object of the color image and acquiring pixel points corresponding to the object.

In order to meet the requirement of daily mobile terminal equipment on object volume measurement, in the embodiment of the application, some lightweight segmentation and classification models can be constructed and trained in advance. And storing the trained segmentation and classification models in the mobile terminal equipment. The embodiment of the present application does not limit the type of the model and the training method used specifically, and the skilled person can select or set the model according to the actual requirements.

For example, in some alternative embodiments, the segmentation and classification model may be constructed using a Network structure of mobilenetv2 plus Region generation Network (RPN) plus roilign plus 5-layer codec. And the loss function L equals Lcls + Lbox + Lmask. Where Lcls is the classification error, Lbox is the detection error, and Lmask is the segmentation error. The corresponding training method comprises the following steps: a plurality of color images are acquired as sample data. And adding a label of the object type to each color image as sample data, and marking a region image corresponding to the object in the color image. During training, the sample data is input into a segmentation and classification model with a well-established value, and model parameters are updated. And judging whether the loss function meets the preset requirement after each updating is finished. And carrying out iterative updating when the requirements are not met until the model converges to obtain the segmentation and classification model after training.

When using the trained segmentation and classification models, the size of the input data is unified in order to reduce the data throughput. The color image may be adjusted to a predetermined size and then input to the segmentation and classification model. The specific size of the preset size can be set by a technician according to actual needs, and is not limited herein. For example, 224 pixels x 224 pixels can be set. And processing the color image by the segmentation and classification model, and outputting a corresponding object type and a region image corresponding to the object.

2. And screening the three-dimensional point cloud corresponding to the object.

As an optional embodiment of the present application, in order to screen out the first three-dimensional point cloud of the object to be measured from the second three-dimensional point cloud or the third three-dimensional point cloud in the embodiment shown in fig. 2C. According to the embodiment of the application, a three-dimensional segmentation model is constructed and trained in advance, and point cloud segmentation is carried out on the second three-dimensional point cloud or the third three-dimensional point cloud by adopting the three-dimensional segmentation model. The method and the device for three-dimensional segmentation do not limit the types of the models and the training method of the three-dimensional segmentation model too much, and can be selected or set by technicians according to actual requirements. For example, in some alternative embodiments, a three-dimensional segmentation model may be constructed in a manner of PointNet model plus layered extraction features plus 3 layers of deconvolution. And then, the three-dimensional coordinates are subjected to center alignment by combining a coordinate normalization network T-Net. And finally, training the three-dimensional segmentation model by using an iterative training method.

Meanwhile, when the trained three-dimensional segmentation model is used, the size of input data is unified in order to reduce the data processing amount. The three-dimensional point cloud to be processed can be processed by inputting a three-dimensional segmentation model after uniformly adopting a preset number of points. The specific size of the preset number can be set by a technician according to actual requirements, and is not limited herein. For example, 4096 may be set in some alternative embodiments.

3. Cut and volume calculation for bounding box.

The corresponding relation among the object type, the object outline shape and the real three-dimensional space shape of the object is complex. If technicians manually enumerate corresponding three-dimensional space shapes under different object types and object outline shapes, the enumeration result is very limited. And for some objects with irregular spatial shapes, the enumeration method may have limitations. Therefore, the bounding box cutting is performed based on the matched three-dimensional space shape, and the obtained result is accurate and may be reduced sometimes.

To solve this problem, in the embodiment of the present application, a volume estimation model for adaptively calculating the volume of the object according to the size of the bounding box, the type of the object, and the shape of the outline of the object is constructed and trained in advance. At this time, the operation of S2031 may be replaced with: and acquiring the size of the bounding box, and processing the size of the bounding box, the type of the object and the outline shape of the object by using a pre-trained volume estimation model to obtain a volume measurement result.

The model type and the training method of the volume estimation model are not limited in the embodiment of the application, and can be selected or set by technicians according to actual requirements. For example, in some alternative embodiments, a multi-layer Perceptron (MLP) regression network model may be constructed that contains 5 hidden layers. Model iterative training is carried out by taking the size, the object type and the object outline shape of the bounding box which are acquired in advance as sample data, and optimal convergence of the model is realized by minimizing the root mean square error, so that the selected finished volume estimation model is obtained.

Secondly, a target tracking function can be preset for the shooting equipment, so that when the object to be detected is shot in a video mode and the color image and the depth image are obtained according to the shot video, the quality of the obtained color image and the obtained depth image is guaranteed.

Considering the embodiment shown in fig. 2A, the embodiment shown in fig. 2C, the embodiment shown in fig. 2F, and the embodiment shown in fig. 2G, it is possible to perform image frame extraction from a video to obtain a desired color image and depth image. When the object moves, the definition of the video frame is difficult to be ensured, and further the measurement of the object volume is influenced. To avoid this, in the embodiment of the present application, a target tracking function is preset in the photographing apparatus. So as to ensure the definition of image frames in the video shooting process.

And thirdly, the embodiment of the application is combined with different scenes for application.

The embodiment shown in fig. 2A, the embodiment shown in fig. 2C, the embodiment shown in fig. 2F, and the embodiment shown in fig. 2G are all related embodiments for measuring the volume of an object. In practical application, the embodiments can be combined with different practical application scenes for application, so that the problems and user requirements existing in the practical scenes are solved, and the practical value of the embodiments is improved. The following is illustrated with a few common application scenarios as examples:

1. diet-related scenarios.

With the continuous popularization of the concept of healthy diet, people pay more attention to their diet structure, intake energy value and the like. In order to understand data such as nutritional ingredients and energy values of food eaten each time, the conventional method comprises the following steps: the user calculates the data of the nutrient content and the energy value of the food according to the nutrient content table and the energy table in the food package when the food is purchased. But here a food-related list of nutrients and energy is required to allow data calculation. Such as the amount of protein, water and vitamins contained per 100 grams of the food, and the amount of calories contained per 100 grams of the food. However, this method requires the user to know the nutrient content table and energy table of the food, and also requires the user to measure the object volume and calculate the corresponding data, which is complicated to calculate. Therefore, for most daily life scenes, it is difficult for users to know data such as nutritional ingredients and energy values of food.

In order to realize the acquisition of data such as food nutrient composition and energy value, include in this application embodiment:

when the first operation of the user is detected, the volume of the food is measured based on the vision, and the volume of the food is obtained.

The method comprises the steps of obtaining the type of food, obtaining a nutrient composition table and/or an energy table of the food according to the type of the food, and calculating nutrient composition data and/or an energy value of the food according to the volume of the food and the nutrient composition table and/or the energy table.

The first operation refers to a functional operation of a user starting food nutrient and/or energy value analysis in the terminal device, for example, when the function is integrated in a certain application program, the first operation refers to an operation of the user starting the function. Any of the embodiments shown in fig. 2A, 2C, 2F and 2G can be used to measure the volume of the food, so as to obtain the volume of the food. Meanwhile, the embodiment of the application can be used for setting corresponding nutrient composition meters and/or energy meters for various different foods in advance. On this basis, corresponding nutritional composition data and/or energy values are calculated from the measured food volume. Therefore, the user does not need to touch food, manually look up tables and calculate data, and the effective management of the self dietary structure and the intake energy value can be realized.

2. And (4) related scenes of commodities and express delivery.

For some goods and couriers whose prices are calculated based on volume, the accurate price is calculated. The volume of the goods or the courier needs to be measured manually and the corresponding price is calculated according to the pricing standard. On one hand, when the quantity of commodities or express delivery is large or the volume is large, the operation of volume measurement becomes very complicated and is easy to make mistakes. On the other hand, manual measurement requires that measuring staff contact with commodities or express delivery, and certain hidden danger can be brought to the safety of the measuring staff.

In order to improve the efficiency of calculating the prices of commodities and expressage, the embodiment of the application comprises the following steps:

and when the second operation of the user is detected, carrying out volume measurement on the commodity or the express delivery based on the vision to obtain the volume of the commodity or the express delivery.

And acquiring pricing standard data of the commodities or the expressages, and calculating the prices of the commodities or the freight rates of the expressages according to the volumes and the pricing standard data.

The second operation refers to a function operation of starting the calculation of the commodity and/or express delivery price in the terminal device by the user, for example, when the function is integrated in a certain application program, the second operation refers to an operation of starting the function by the user. Any one of the embodiments shown in fig. 2A, 2C, 2F, and 2G may be adopted to measure the volume of the commodity or the express delivery, so as to obtain the volume of the commodity or the express delivery. Therefore, the method and the device can reduce the complexity of commodity or express price calculation operation, improve the calculation efficiency and improve the convenience of life of a user. Meanwhile, the safety of workers can be guaranteed without the need of contacting with commodities or express.

3. Scene of house design.

When a user purchases furniture, whether the size of the purchased furniture meets the size requirement of the furniture placed in a house or not needs to be considered, and whether the furniture purchased in a queue meets the decoration style of the house or not needs to be considered. In order to help the user to achieve the above object more intuitively and conveniently, in the embodiment of the present application, the method includes:

and when the third operation of the user is detected, carrying out vision-based volume measurement on the furniture to obtain the size information and the volume of the furniture.

And when the fourth operation of the user is detected, acquiring an image or video of the furniture area to be placed, and displaying the furniture in an overlapping manner to the image or video of the furniture area to be placed by utilizing an Augmented Reality (AR) technology according to the size information and the volume of the furniture.

And receiving an adjusting instruction of the user for the furniture displayed in an overlapping mode, and adjusting the position of the furniture displayed in the overlapping mode in the image or video of the furniture area to be placed according to the adjusting instruction.

The third operation refers to an operation of a user starting a furniture volume measurement function in the terminal device, for example, when the function is integrated in a certain application program, the third operation refers to an operation of the user starting the function. The fourth operation is an operation of starting the home-specific virtual placement function in the terminal device by the user, for example, when the function is integrated in an application program, the fourth operation is an operation of starting the function by the user. Any of the embodiments shown in fig. 2A, 2C, 2F and 2G can be used to measure the size and volume of the furniture. In the embodiment of the application, after the user performs visual volume measurement on favorite furniture, three-dimensional object modeling can be performed according to the measured size and volume data. When the user needs to judge whether the furniture is appropriate, the user only needs to shoot or take a video aiming at the area of the furniture to be placed in the home. At this moment, the embodiment of the application can display furniture in an overlaid manner in the obtained image or video through the AR technology. The user can freely adjust the position of the furniture in the image or the video according to the requirement so as to meet the furniture placing requirement of the user. The furniture is put through the simulation, so that a user can more visually see whether the size of the furniture meets the requirement. And meanwhile, whether the style of the furniture is proper or not can be judged. Therefore, the furniture layout planning method and the furniture layout planning device can greatly improve the efficiency of furniture layout design and scene layout planning.

4. And (5) game scenes.

Many games that can perform human-computer interaction based on a camera exist, for example, some games can capture the motion of a user through the camera and control the motion of a game character according to the motion of the user. Although these games can capture and analyze the motion of the user, the generation of the block by the game character is often tedious. Generally, developers preset certain fixed game role images, and users select the fixed game role images according to preferences. This may reduce the user's sense of game substitution and reduce the user's gaming experience. To solve this problem, in the embodiments of the present application:

and when the fifth operation of the user is detected, carrying out vision-based volume measurement on the user to obtain the height, the circumference and the volume of the user.

And performing virtual human body modeling according to the height, the three-dimensional and the volume, and creating a game role corresponding to the user according to the obtained human body model.

And receiving an adjusting instruction input by a user, and adjusting the size of the game role according to the adjusting instruction.

The fifth operation refers to an operation of starting the game character creation function in the terminal device by the user, for example, when the function is integrated in a certain application program, the fifth operation refers to an operation of starting the function by the user. Any of the embodiments shown in fig. 2A, 2C, 2F and 2G can be used to measure the height, the circumference and the volume of the user, wherein the height and the circumference can be obtained by analyzing the dimensions of the cut bounding box. According to the embodiment of the application, the real stature of the user can be visually measured, and the game role can be automatically generated according to the stature condition of the user. Thereby making the match between the game character and the actual user higher. Therefore, the user can have stronger substituting feeling for the game role, and the game experience of the user is better. Meanwhile, the size of the game role can be adjusted by the user according to actual requirements so as to adapt to individual requirements of different users and improve game experience of the user.

Fig. 3 shows a block diagram of a structure of a vision-based volume measuring device provided in an embodiment of the present application, corresponding to the vision-based volume measuring method in the foregoing embodiment, and only shows portions related to the embodiment of the present application for convenience of description.

Referring to fig. 3, the vision-based volume measuring device includes:

and the object identification module 31 is configured to acquire a color image of the object to be detected, and perform object identification on the color image to obtain an object type of the object to be detected.

And the bounding box generating module 32 is configured to acquire a depth image of the object to be detected, generate a first three-dimensional point cloud of the object to be detected according to the depth image, and generate a bounding box of the first three-dimensional point cloud.

And the volume measuring module 33 is configured to cut the bounding box according to the type of the object, calculate the volume of the cut bounding box, and use the obtained volume as a volume measurement result of the object to be measured.

Further, the volume measurement module 33 includes:

and the first measurement submodule acquires the three-dimensional space shape of the object to be measured according to the type of the object and cuts the bounding box according to the three-dimensional space shape.

Further, the volume measurement module 33 includes:

and the second measurement submodule is used for acquiring the first size information of the bounding box and processing the first size information and the object type by using a volume estimation model trained in advance to obtain a volume measurement result. The volume estimation model is used for cutting the bounding box and calculating a volume measurement result according to the cut bounding box.

Further, in the object identification module 31, the object identification module may further obtain a first pixel point of the object to be detected in the color image, and correspondingly, the bounding box generation module 32 includes:

and the pixel expansion module is used for screening out second pixel points within a preset range of the object to be detected from the color image to obtain a pixel point set consisting of the first pixel points and the second pixel points.

And the first point cloud screening module is used for generating a second three-dimensional point cloud corresponding to the depth image and screening a third three-dimensional point cloud corresponding to the pixel point set from the second three-dimensional point cloud.

And the second point cloud screening module is used for screening the first three-dimensional point cloud corresponding to the object to be detected from the third three-dimensional point cloud.

Further, in the object recognition module 31, the object recognition module may further obtain an object contour shape of the object to be measured, and correspondingly, the first measurement sub-module includes:

Further, in the object recognition module 31, the object recognition module may further obtain an object contour shape of the object to be measured, and correspondingly, the second measurement sub-module includes:

Further, the object to be measured is food, and the vision-based volume measuring device further includes:

and a first response module for obtaining a volume measurement result of the food by using the object recognition module 31, the bounding box generation module 32 and the volume measurement module 33 in response to a first operation of the user.

And the food analysis module is used for calculating nutrient component data and/or energy value data of food according to the object type and the volume measurement result, and displaying and outputting and/or voice broadcasting the nutrient component data and/or the energy value data.

Further, the object that awaits measuring is the express delivery, and this volume measurement device based on vision still includes:

and the second response module is used for responding to a second operation of the user and obtaining the express volume measurement result by using the object identification module 31, the bounding box generation module 32 and the volume measurement module 33.

And calculating the freight price of the express delivery according to the volume measurement result.

Further, the object to be measured is furniture, and the volume measuring module 33 includes:

and the first size calculation module is used for acquiring second size information of the cut bounding box, taking the second size information as the size information of the object to be detected, and calculating the volume of the cut bounding box according to the second size information.

Accordingly, the vision-based volumetric measuring device further comprises:

and a third response module, configured to, in response to a third operation by the user, obtain a volume measurement result and second size information of the furniture by using the object identification module 31, the bounding box generation module 32, and the volume measurement module 33.

And the fourth response module is used for responding to the fourth operation of the user, acquiring the image or the video of the preset space region, and displaying the furniture in an overlapping manner to the image or the video of the preset space region by using the augmented reality technology according to the volume measurement result and the second size information.

Further, the object to be measured is a human body, and the volume measurement module 33 includes:

and the second size calculation module is used for acquiring second size information of the cut bounding box, taking the second size information as the size information of the object to be detected, and calculating the volume of the cut bounding box according to the second size information.

Accordingly, the vision-based volumetric measuring device further comprises:

and a fifth response module, configured to, in response to a fifth operation by the user, obtain a volume measurement result and second size information of the human body by using the object identification module 31, the bounding box generation module 32, and the volume measurement module 33.

And the role creating module is used for carrying out virtual human body modeling according to the volume measurement result and the second size information to obtain a human body model corresponding to the human body, and creating a game role related to the human body by using the human body model.

The process of implementing each function by each module in the vision-based volume measurement device provided in the embodiment of the present application may specifically refer to the description of the embodiment shown in fig. 2A, the embodiment shown in fig. 2C, the embodiment shown in fig. 2F, the embodiment shown in fig. 2G, and other related method embodiments, and is not repeated here.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance. It will also be understood that, although the terms first, second, etc. may be used herein to describe various elements in some embodiments of the application, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first table may be named a second table, and similarly, a second table may be named a first table, without departing from the scope of various described embodiments. The first table and the second table are both tables, but they are not the same table.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The volume measurement method based on the vision provided by the embodiment of the application can be applied to terminal devices such as a mobile phone, a tablet personal computer, a wearable device, a vehicle-mounted device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA) and the like, and the embodiment of the application does not limit the specific types of the terminal devices at all.

For example, the terminal device may be a Station (ST) in a WLAN, which may be a cellular phone, a cordless phone, a Session Initiation Protocol (SIP) phone, a Wireless Local Loop (WLL) station, a Personal Digital Assistant (PDA) device, a handheld device with Wireless communication capability, a computing device or other processing device connected to a Wireless modem, a vehicle-mounted device, a vehicle-networking terminal, a computer, a laptop, a handheld communication device, a handheld computing device, a satellite Wireless device, a Wireless modem card, a television set-top box (STB), a Customer Premises Equipment (CPE), and/or other devices for communicating over a Wireless system and a next generation communication system, e.g., a terminal device in a 5G Network or a Public Land Mobile Network (future evolved, PLMN) terminal equipment in the network, etc.

By way of example and not limitation, when the terminal device is a wearable device, the wearable device may also be a generic term for intelligently designing daily wearing by applying wearable technology, developing wearable devices, such as glasses, gloves, watches, clothing, shoes, and the like. A wearable device is a portable device that is worn directly on the body or integrated into the clothing or accessories of the user. The wearable device is not only a hardware device, but also realizes powerful functions through software support, data interaction and cloud interaction. The generalized wearable intelligent device has the advantages that the generalized wearable intelligent device is complete in function and large in size, can realize complete or partial functions without depending on a smart phone, such as a smart watch or smart glasses, and only is concentrated on a certain application function, and needs to be matched with other devices such as the smart phone for use, such as various smart bracelets for monitoring physical signs, smart jewelry and the like.

Fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 4, the terminal device 4 of this embodiment includes: at least one processor 40 (only one shown in fig. 4), a memory 41, said memory 41 having stored therein a computer program 42 executable on said processor 40. The processor 40, when executing the computer program 42, implements the steps in the various vision-based volumetric measurement method embodiments described above, such as steps 2011-2031 shown in fig. 2A. Alternatively, the processor 40, when executing the computer program 42, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the modules 31 to 34 shown in fig. 3.

The terminal device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 40, a memory 41. It will be understood by those skilled in the art that fig. 4 is merely an example of a terminal device 4 and does not constitute a limitation of the terminal device 4 and may include more or less components than those shown, or some components may be combined, or different components, for example, the terminal device may also include an input transmitting device, a network access device, a bus, etc.

The Processor 40 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 41 may in some embodiments be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. The memory 41 may also be an external storage device of the terminal device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the terminal device 4. The memory 41 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 41 may also be used to temporarily store data that has been transmitted or is to be transmitted.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.

The embodiments of the present application provide a computer program product, which when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments when executed.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application, and are intended to be included within the scope of the present application.

Finally, it should be noted that: the above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A vision-based volumetric measurement method, comprising:

acquiring a color image of an object to be detected, and performing object identification on the color image to obtain the object type of the object to be detected;

acquiring a depth image of the object to be detected, generating a first three-dimensional point cloud of the object to be detected according to the depth image, and generating a bounding box of the first three-dimensional point cloud;

and cutting the bounding box according to the type of the object, calculating the volume of the cut bounding box, and taking the obtained volume as the volume measurement result of the object to be measured.

2. The vision-based volumetric measurement method of claim 1, wherein said cutting the bounding box according to the object type comprises:

and acquiring the three-dimensional space shape of the object to be detected according to the object type, and cutting the bounding box according to the three-dimensional space shape.

3. The vision-based volume measurement method according to claim 1, wherein the cutting the bounding box according to the object type, calculating the volume of the cut bounding box, and using the obtained volume as the volume measurement result of the object to be measured comprises:

acquiring first size information of the bounding box, and processing the first size information and the object type by using a volume estimation model trained in advance to obtain a volume measurement result; the volume estimation model is used for cutting the bounding box and calculating the volume measurement result according to the cut bounding box.

4. The vision-based volumetric measurement method according to any one of claims 1 to 3, wherein after the operation of object recognition on the color image, a first pixel point of the object to be measured in the color image is also obtained;

correspondingly, the generating a first three-dimensional point cloud of the object to be detected according to the depth image includes:

screening out second pixel points within a preset range of the object to be detected from the color image to obtain a pixel point set consisting of the first pixel points and the second pixel points;

generating a second three-dimensional point cloud corresponding to the depth image, and screening a third three-dimensional point cloud corresponding to the pixel point set from the second three-dimensional point cloud;

and screening the first three-dimensional point cloud corresponding to the object to be detected from the third three-dimensional point cloud.

5. The vision-based volumetric measurement method of claim 2, wherein after the operation of object recognition on the color image, an object contour shape of the object to be measured is also obtained;

correspondingly, the obtaining the three-dimensional shape of the object to be detected according to the object type includes:

6. The vision-based volumetric measurement method of claim 2, wherein after the operation of object recognition on the color image, an object contour shape of the object to be measured is also obtained;

correspondingly, the cutting the bounding box according to the three-dimensional space shape comprises the following steps:

and obtaining a cutting strategy corresponding to the contour shape of the object, and cutting the bounding box according to the cutting strategy until the shape of the cut bounding box is the three-dimensional space shape, thereby finishing cutting.

7. The vision-based volumetric measurement method of claim 3, wherein after the operation of object recognition on the color image, an object contour shape of the object to be measured is also obtained;

correspondingly, the processing the first size information and the object type by using the volume estimation model trained in advance to obtain the volume measurement result includes:

and processing the first size information, the object outline shape and the object type by using the volume estimation model trained in advance to obtain the volume measurement result.

8. The vision-based volumetric measurement method of any one of claims 1-7, wherein the object under test is food, the method further comprising:

responding to a first operation of a user, and executing the operation of acquiring the color image of the object to be measured until the volume measurement result of the food is obtained;

and calculating nutrient component data and/or energy value data of the food according to the object type and the volume measurement result, and displaying and outputting and/or voice broadcasting the nutrient component data and/or the energy value data.

9. The vision-based volumetric measurement method of any one of claims 1-7, wherein the object under test is a courier, the method further comprising:

responding to a second operation of a user, and executing the operation of obtaining the color image of the object to be measured until the volume measurement result of the express delivery is obtained;

10. The vision-based volumetric measurement method of any one of claims 1 to 7, wherein the object under test is furniture, and the calculating the volume of the bounding box after cutting comprises:

acquiring second size information of the cut bounding box, taking the second size information as the size information of the object to be detected, and calculating the volume of the cut bounding box according to the second size information;

correspondingly, the vision-based volumetric measurement method further comprises:

responding to a third operation of a user, and executing the operation of obtaining the color image of the object to be measured until the volume measurement result and the second size information of the furniture are obtained;

and responding to the fourth operation of the user, acquiring an image or a video of a preset space region, and displaying the furniture in an overlaying manner to the image or the video of the preset space region by utilizing an augmented reality technology according to the volume measurement result and the second size information.

11. The vision-based volume measurement method according to any one of claims 1 to 7, wherein the object to be measured is a human body, and the calculating the volume of the bounding box after cutting comprises:

responding to a fifth operation of a user, and executing the operation of obtaining the color image of the object to be measured until the volume measurement result and the second size information of the human body are obtained;

and performing virtual human body modeling according to the volume measurement result and the second size information to obtain a human body model corresponding to the human body, and creating a game role associated with the human body by using the human body model.

12. A terminal device, characterized in that the terminal device comprises a memory, a processor, a computer program being stored on the memory and being executable on the processor, the processor implementing the steps of the method according to any of claims 1 to 11 when executing the computer program.

13. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 11.