CN111862098B

CN111862098B - Individual matching method, device, equipment and medium based on light field semantics

Info

Publication number: CN111862098B
Application number: CN201910361188.5A
Authority: CN
Inventors: 刘睿洋
Original assignee: Yaoke Intelligent Technology Shanghai Co ltd
Current assignee: Yaoke Intelligent Technology Shanghai Co ltd
Priority date: 2019-04-30
Filing date: 2019-04-30
Publication date: 2023-11-24
Anticipated expiration: 2039-04-30
Also published as: CN111862098A

Abstract

The application provides an individual matching method, device, equipment and medium based on light field semantics, which are characterized in that a light field image set containing different visual angles is obtained; refocusing the light field image of any view angle to form a focusing image set, and carrying out semantic analysis on each focusing image to obtain focusing semantic segmentation of different individuals; clustering the individuals belonging to the same class according to the result of each focusing semantic segmentation to obtain depth value ranges of each individual on different focusing images; carrying out semantic analysis on the light field images under other view angles to obtain original semantic segmentation of the individuals corresponding to different view angles; and obtaining the corresponding relation of each individual under each view angle through reprojection according to the depth value range. According to the application, the corresponding relation of semantic individuals under multiple visual angles is obtained, and the search range is limited in the same area during stereo matching, so that the matching calculation time is greatly shortened.

Description

Individual matching method, device, equipment and medium based on light field semantics

Technical Field

The application relates to the technical field of computer vision processing, in particular to an individual matching method, device, equipment and medium based on light field semantics.

Background

The stereo matching is mostly performed based on the whole picture, and the optimization target only considers the color information of the picture. Thus, a match failure often occurs when the scene has a repetitive pattern, thereby affecting the accuracy of the result. The deep learning based semantic analysis can help to a certain extent solve this case of matching failure with color only considered.

Semantic segmentation is one of the fundamental tasks of computer vision. The task takes a two-dimensional image as input, separates different object areas in the image through a visual algorithm, and identifies content (semantic value) in the image, namely, determines the semantic category for each pixel point of the picture while ensuring the continuity of the image area. The traditional segmentation method is mainly based on statistical methods such as a conditional random field, a random forest and the like to construct a classifier, and after deep learning, the convolutional neural network is used for realizing efficient picture classification and simultaneously, great progress is made on the segmentation problem. Meanwhile, with the development of multi-view geometry, more and more researchers fuse stereoscopic vision information into the traditional monocular vision algorithm flow, so that better algorithm performance can be obtained, but real-time acquisition is difficult to achieve for multi-frame acquisition, and complex system problems such as synchronous setting and the like exist in a multi-camera system. In addition, semantic segmentation often does not yield satisfactory results when there are severe occlusions in the scene.

The light field camera is used as a special condition of multi-view geometry, the arrangement of the cameras is carried out according to a certain rule, and compared with a multi-camera system, the light field camera can combine the advantages of the multi-view geometry and a monocular algorithm, can effectively remove shielding, and can collect multi-view information in one exposure.

However, there is a problem that the multi-view information is difficult to obtain the corresponding relation of semantic individuals under the multi-view, and therefore, when searching for a matching pixel for each pixel in the stereo matching process, the searching range of each pixel cannot be limited effectively, so that the time of matching operation is greatly increased.

Disclosure of Invention

In view of the above drawbacks of the prior art, an object of the present application is to provide a method, apparatus, device and medium for matching individuals based on light field semantics, so as to solve the problem that it is difficult to obtain semantic individual correspondence under multiple viewing angles in the prior art.

To achieve the above and other related objects, the present application provides an individual matching method based on light field semantics, the method comprising: acquiring a light field image set containing different visual angles; refocusing the light field images of any view angle at different depths to form a focusing image set, and performing semantic analysis on each focusing image to obtain focusing semantic segmentation of different individuals; clustering the individuals belonging to the same class according to the result of each focusing semantic segmentation to obtain depth value ranges of each individual on different focusing images; carrying out semantic analysis on the light field images under other view angles to obtain original semantic segmentation of the individuals corresponding to different view angles; mapping the result of the focusing semantic segmentation corresponding to each individual to each view angle through reprojection according to the depth value range corresponding to each individual on the focusing image set so as to form mapping semantic segmentation; and obtaining the corresponding relation of each individual under each view angle through the similarity comparison of the original semantic segmentation and the mapping semantic segmentation.

In an embodiment of the present application, the light field image set is composed of multi-view images obtained by shooting a scene corresponding to the camera array.

In an embodiment of the application, in the refocusing, different focusing depths are selected according to the average value.

In an embodiment of the present application, the method for performing semantic analysis on each focused image to obtain focused semantic segmentation of different individuals includes: finding each individual according to target detection, and carrying out semantic marking and segmentation on each individual through a boundary box; and calculating semantic confidence and focal power corresponding to each individual, and accordingly obtaining quality distribution of focusing semantic segmentation corresponding to the focusing image.

In an embodiment of the present application, the method for clustering the individuals belonging to the same class according to the result of each of the semantic focusing divisions includes: and optionally, roughly estimating the similarity of the two individuals according to the difference between the boundary boxes corresponding to the individuals and the difference between the depth values so as to judge whether the two individuals belong to the same class.

In an embodiment of the present application, the method further includes: taking the depth value range of each individual on different focusing images as the depth value range of each clustered individual; selecting the individuals with the highest quality distribution in the clusters of the individuals, obtaining semantic information corresponding to the individuals through semantic analysis, and obtaining a depth value range corresponding to the individuals as a result of focusing semantic segmentation of the individuals.

In an embodiment of the present application, the method for mapping the result of the focused semantic segmentation corresponding to each individual to each view angle by reprojection according to the depth value range corresponding to each individual on the focused image set to form a mapped semantic segmentation includes: finding out pixels of the focusing image set under the refocusing view angle under different focusing depths for each pixel in the light field image under other target view angles; and selecting the semantic information which has the smallest focusing depth and does not belong to the background as the semantic information of the current pixel under the target visual angle.

In an embodiment of the present application, the method for obtaining the correspondence of the individual at each view angle by comparing the similarity between the original semantic segmentation and the mapping semantic segmentation includes: comparing the similarity comparison value of the original semantic segmentation and the mapping semantic segmentation with a preset value; if the result of mapping semantic segmentation is smaller than the preset value, selecting the result of mapping semantic segmentation to represent the semantic segmentation result corresponding to the individual under each view angle; according to the semantic segmentation result of each individual under each view angle, the corresponding relation of each individual under each view angle is obtained.

In an embodiment of the present application, the depth value range of each individual corresponds to a parallax range, and the parallax range can be used as a parallax search interval during stereo matching, so as to reduce the computation time of the stereo matching.

To achieve the above and other related objects, the present application provides an electronic device comprising: the acquisition module is used for acquiring a light field image set containing different visual angles; the processing module is used for selecting the light field image of any view angle to refocus at different depths to form a focusing image set, and carrying out semantic analysis on each focusing image to obtain focusing semantic segmentation of different individuals; clustering the individuals belonging to the same class according to the result of each focusing semantic segmentation to obtain depth value ranges of each individual on different focusing images; carrying out semantic analysis on the light field images under other view angles to obtain original semantic segmentation of the individuals corresponding to different view angles; mapping the result of the focusing semantic segmentation corresponding to each individual to each view angle through reprojection according to the depth value range corresponding to each individual on the focusing image set so as to form mapping semantic segmentation; and obtaining the corresponding relation of each individual under each view angle through the similarity comparison of the original semantic segmentation and the mapping semantic segmentation.

To achieve the above and other related objects, the present application provides an electronic device comprising: a memory, a processor, and a communicator; the memory is used for storing computer instructions; the processor is configured to execute computer instructions to implement the method as described above; the communicator communicates with a connected external device.

To achieve the above and other related objects, the present application provides a non-transitory computer-readable storage medium storing computer instructions which, when executed, perform a method as described above.

In summary, the individual matching method, the device, the equipment and the medium based on the light field semantics acquire the light field image set containing different visual angles; refocusing the light field images of any view angle at different depths to form a focusing image set, and performing semantic analysis on each focusing image to obtain focusing semantic segmentation of different individuals; clustering the individuals belonging to the same class according to the result of each focusing semantic segmentation to obtain depth value ranges of each individual on different focusing images; carrying out semantic analysis on the light field images under other view angles to obtain original semantic segmentation of the individuals corresponding to different view angles; mapping the result of the focusing semantic segmentation corresponding to each individual to each view angle through reprojection according to the depth value range corresponding to each individual on the focusing image set so as to form mapping semantic segmentation; and obtaining the corresponding relation of each individual under each view angle through the similarity comparison of the original semantic segmentation and the mapping semantic segmentation.

Has the following beneficial effects:

the corresponding relation of semantic individuals under multiple visual angles is obtained, and the search range is limited in the same area during stereo matching, so that the matching calculation time is greatly shortened.

Drawings

Fig. 1 shows a schematic view of an individual matching method based on light field semantics in an embodiment of the present application.

Fig. 2 is a flow chart of an individual matching method based on light field semantics in an embodiment of the application.

Fig. 3 is a conceptual diagram of a light field according to an embodiment of the application.

Fig. 4 is a schematic diagram of a light field camera array according to an embodiment of the application.

Fig. 5 shows a scene diagram of a light field camera application and a light field image set according to an embodiment of the application.

Fig. 6 is a schematic block diagram of an electronic device according to an embodiment of the application.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the application.

Detailed Description

Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

The embodiments of the present application will be described in detail below with reference to the attached drawings so that those skilled in the art to which the present application pertains can easily implement the present application. This application may be embodied in many different forms and is not limited to the embodiments described herein.

In order to clearly explain the present application, components irrelevant to the description are omitted, and the same or similar components are given the same reference numerals throughout the description.

Throughout the specification, when a component is said to be "connected" to another component, this includes not only the case of "direct connection" but also the case of "indirect connection" with other elements interposed therebetween. In addition, when a certain component is said to "include" a certain component, unless specifically stated to the contrary, it is meant that other components are not excluded, but other components may be included.

When an element is referred to as being "on" another element, it can be directly on the other element but be accompanied by the other element therebetween. When a component is stated to be "directly on" another component, it is stated that there are no other components between them.

Although the terms first, second, etc. may be used herein to describe various elements in some examples, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. Such as a first interface and a second interface, etc. Furthermore, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes," and/or "including" specify the presence of stated features, steps, operations, elements, components, items, categories, and/or groups, but do not preclude the presence, presence or addition of one or more other features, steps, operations, elements, components, items, categories, and/or groups. The terms "or" and/or "as used herein are to be construed as inclusive, or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a, A is as follows; b, a step of preparing a composite material; c, performing operation; a and B; a and C; b and C; A. b and C). An exception to this definition will occur only when a combination of elements, functions, steps or operations are in some way inherently mutually exclusive.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the language clearly indicates the contrary. The meaning of "comprising" in the specification is to specify the presence of stated features, regions, integers, steps, operations, elements, and/or components, but does not preclude the presence or addition of other features, regions, integers, steps, operations, elements, and/or components.

Terms representing relative spaces such as "lower", "upper", and the like may be used to more easily describe the relationship of one component relative to another component illustrated in the figures. Such terms refer not only to the meanings indicated in the drawings, but also to other meanings or operations of the device in use. For example, if the device in the figures is turned over, elements described as "under" other elements would then be oriented "over" the other elements. Thus, the exemplary term "lower" includes both upper and lower. The device may be rotated 90 deg. or at other angles and the terminology representing relative space is to be construed accordingly.

Although not differently defined, including technical and scientific terms used herein, all have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The term addition defined in the commonly used dictionary is interpreted as having a meaning conforming to the contents of the related art document and the current hint, so long as no definition is made, it is not interpreted as an ideal or very formulaic meaning too much.

In order to solve the problems that the traditional monocular vision algorithm cannot achieve multi-frame acquisition, shielding, synchronous setting and the like in the stereo matching process, the method is provided on the basis of the light field technology, and when the technical problem to be solved by the method is based on the data acquired by the light field, the semantic individual corresponding relation under the multi-view is difficult to obtain.

For an easy understanding of the method according to the application, reference is made to fig. 1. As shown, the method mainly relates to focused image semantic segmentation and normal light field semantic segmentation.

The focusing semantic segmentation is to form a focusing image set by focusing with different depths aiming at a light field image under a view angle (the light field data comprises a plurality of view angles), wherein the focusing image set is a group of images with different focal lengths from a fixed view point. During refocusing, a focal stack can be generated from the light field input by different depth settings. And then carrying out individual semantic segmentation on each focusing image by combining semantic analysis, obtaining quality evaluation of the focusing images by calculating semantic confidence and focusing degree of each individual, and obtaining depth value ranges of each individual on different focusing images by clustering, thereby forming focusing semantic segmentation results.

And carrying out semantic analysis on the light field images under other visual angles, and obtaining the result of the semantic segmentation of the original light field corresponding to each detected individual.

And mapping the result of the focusing semantic segmentation corresponding to each individual to each view angle through reprojection of the depth value range corresponding to each individual so as to form mapping semantic segmentation.

And finally, obtaining the corresponding relation of each individual under each view angle through the similarity comparison of the original semantic segmentation and the mapping semantic segmentation.

In addition, as shown in fig. 1, it should be further noted that the depth value range of each individual corresponds to a parallax range, and the parallax range can be used as a parallax search interval during stereo matching, so as to reduce the computation time of the stereo matching.

As shown in fig. 2, a flow chart of an individual matching method based on light field semantics in an embodiment of the present application is shown. As shown, the method includes:

step S301: a set of light field images is acquired that contain different perspectives.

In an embodiment of the application, the light field image set is composed of multi-view images obtained by shooting a scene corresponding to the camera array.

In this embodiment, the method and the final stereo matching algorithm are performed based on the light field camera.

The light field has good de-occlusion properties. The semantic segmentation model based on deep learning is mostly carried out on a single picture, so that the corresponding relation of semantic individuals under multiple visual angles is difficult to obtain, and when severe shielding exists in a scene, satisfactory results cannot be obtained through semantic segmentation.

Generally, a light field refers to information contained in a beam of light in the propagation process, including information such as intensity, position, direction, and the like of the light. As shown in fig. 3, L is the intensity of light, the (u, v) plane is the position of the light in space, the (s, t) plane is the direction of the spread of light rays in space, and the light field shown is actually a four-dimensional parameterized representation, which is a mental light radiance that contains both position and direction information in space. In short, it covers all information that a ray is in its propagation. The two-dimensional position information (u, v) and the two-dimensional direction information (s, t) carried by the light rays are transmitted in the square.

The light field is a four-dimensional concept of light rays in space propagation, the light field is a parameterized representation of a four-dimensional optical radiation field containing position and direction information in space at the same time, and is the total of all ray radiation functions in space. The real information of the whole space environment can be obtained at any angle and any position in the space, and the image information obtained by the light field is more comprehensive and better in quality.

The light field is a 4D vector that specifies each ray by intersecting two parallel planes uv and st, where s, t is the camera plane representing the position of each light field camera and u, v represents the pixel coordinates of the desired rendering camera.

As shown in fig. 4, which is a schematic structural diagram of a light field camera array, as shown in the drawing, one light field camera array is composed of m×n cameras, where M, N >0, and the cameras in the light field are uniformly arranged in a grid with a pitch of b. The number and spacing of cameras may be determined by the particular use scenario. As shown in fig. 5, a scene diagram is shown for a light field camera application and a light field image set. A light field camera array is used to obtain a light field image set of a scene, wherein the light field image set comprises a plurality of view angles.

It should be noted that the light field camera array mentioned here does not represent that the light field camera array has to be used for performing the method, and the key of the method is the acquired light field image set, the related parameters (internal and external parameters) of the device for acquiring the light field image, and how to optimize the depth after these inputs are obtained. The light field camera array is not integral with the apparatus to which the method is applied.

For example, the method is applied to an electronic device, and the electronic device only needs to receive a light field image set for subsequent processing, and can be connected with a light field camera array to form a system, or can be an independent device, and can form a combination with any light field camera array of any place or any model or any camera number, but not a binding relation.

Step S302: refocusing the light field image of any view angle at different depths to form a focusing image set, and carrying out semantic analysis on each focusing image to obtain focusing semantic segmentation of different individuals.

Specifically, the different focusing depths selected in the refocusing are according to the following formula:

wherein N is the pixel size, i ε [1, …, N]；d _i Is the focusing depth; dsmax is the maximum depth value; dsmin is the minimum depth value.

In this embodiment, the selected one of the viewing angles is preferably a central viewing angle.

In an embodiment of the present application, the method for performing semantic analysis on each focused image to obtain focused semantic segmentation of different individuals includes:

A. finding each individual according to target detection, and carrying out semantic marking and segmentation on each individual through a boundary box;

B. and calculating semantic confidence and focal power corresponding to each individual, and accordingly obtaining quality distribution of focusing semantic segmentation corresponding to the focusing image.

In this embodiment, the semantic segmentation is image semantic segmentation, and each pixel in the image is marked as a corresponding class according to semantic features, and the semantic segmentation is performed on the target scene to segment each volume instance in the scene from the scene.

For example, according to a pre-constructed segmentation model based on a convolutional neural network, the semantic segmentation takes a depth image and a color image of a sample scene as input of the semantic segmentation model, takes a result of artificial semantic segmentation on the image of the sample scene as output, trains the semantic segmentation model, and determines optimal parameters of each layer of the semantic segmentation model.

Specifically, the scene depth map of the target scene is used as input of a preset semantic segmentation model, and a semantic segmentation result of the target scene is obtained. And on the basis of the semantic segmentation result, matching or dividing the corresponding pixel region (P) and the corresponding position box (bounding box) for each individual.

In the present embodiment, for the focusing image S _i The individual on the table is k, and the detection frame is recorded asThe segmentation map is->The corresponding semantic confidence is->The degree of focusing is +.>Corresponding focusing image S _i Quality distribution Q of in-focus semantic segmentation of (C) _i The method comprises the following steps:

wherein P is a focusing picture S _i Is the image region belonging to instance k.The mask corresponding to k is represented by a binary matrix of 0, 1; the addition indicates a pixel-by-pixel modulo operation.

Wherein,(s) ₀ ,t ₀ ) For a viewpoint of a preset viewing angle on an s-t plane,(s) _i ,t _i ) For the point of view of the other view,is(s) ₀ ,t ₀ ) The pixel p under the view angle corresponds to the pixel set corresponding to the other view angles.

The defocus (p) is proposed to aim at the phenomena of ghosting, defocusing and the like in refocusing of the light field, so as to evaluate the difference of pixel values of the pixels taken under different visual angles and distinguish whether the phenomena of ghosting and the like occur.

In one or more embodiments, it may be preferable to pass the difference between the pixel value corresponding to the preset viewing angle and the closest viewing angle. The redefined refocusing process is realized by the difference of pixel values of the pixels taken under different visual angles in the refocusing process, namely, the redefined refocusing process is constrained by the difference of the pixel values. Specifically, the variance between the refocused view angle pixel value and the corresponding pixel value in the closest view angle.

It should be noted that the above-mentioned defocus (p) formula may be passed through a mean shift clustering algorithm, because several classes of individuals are not known in advance, that is, K is unknown.

The heart of the meanshift algorithm is to move each point in the dataset continuously towards the center of density of a given range, according to a given neighborhood range. Finally, the similar density centers are combined to obtain the final classification, and the advantage is that the number of the classes is not preset.

The meanshift algorithm is also used to determine if two individuals are of the same class.

In addition, it should be noted that, the instance semantic segmentation or the individual semantic segmentation is a subtype of semantic segmentation, and is to locate and semantic segment each target individual at the same time, and each target is an independent instance or individual, and the task is finally evaluated with the segmentation accuracy of each instance. For example, on the basis of semantic segmentation, all humans are labeled with the same human. On the basis of instance semantic segmentation or individual semantic segmentation, there is only a unique classification for different individuals in space, such as two different people, which are also labeled as different individuals.

For example, the method of instance segmentation may include: any one of Mask R-CNN, SDS, hyper Columns, CFM, deep & Sharp Mask, MNC, ISFCN, FCIS, SIS and PAN.

Compared with the traditional monocular semantic analysis, the semantic analysis of the focusing light field can solve the problems of inaccurate semantic analysis or serious error under the complex shielding condition. Our focused light field semantic analysis builds at the individual level, i.e. there is and only a unique classification for different individuals in space.

Step S303: clustering the individuals belonging to the same class according to the result of each focusing semantic segmentation to obtain the depth value range of each individual on different focusing images.

In an embodiment of the present application, the method for clustering the individuals belonging to the same class according to the result of each of the semantic focusing divisions includes:

A. and optionally, roughly estimating the similarity of the two individuals according to the difference between the boundary boxes corresponding to the individuals and the difference between the depth values so as to judge whether the two individuals belong to the same class.

In this embodiment, since the semantic analysis of each focused image is performed separately, we cluster detected individuals on different focused pictures by mean shift (mean shift) clustering after the semantic analysis.

Specifically, the similarity of individuals is defined as:

wherein d _i Indicating the depth of the focused image in which the individual is located.Focusing image S _i The detection frame is.

In an embodiment of the application, the method further comprises:

B. and taking the depth value range of each individual on different focusing images as the depth value range of each individual after clustering.

C. Selecting the individuals with the highest quality distribution in the clusters of the individuals, obtaining semantic information corresponding to the individuals through semantic analysis, and obtaining depth value ranges corresponding to the individuals as the focusing semantic segmentation results of the individuals

Further, since the same individual may appear on different in-focus images and represent a degree of defocus, the method models depth differences between different individuals with Gaussian distances.

I.e. each cluster represents the same individual, noted asDepth range->For the depth range contained in the cluster, a mass distribution Q (k, S) is selected in each cluster _i ) The highest individual, its semantic information and depth d _i Inclusion to focus semantic segmentation correspondence junctionsThe set of fruits.

Step S304: and carrying out semantic analysis on the light field images under other visual angles to obtain original semantic segmentation of the individuals corresponding to different visual angles.

In this embodiment, the semantic analysis is performed on the light field images under other view angles, and the content is similar to the principle and process of the semantic analysis in step S302, so that the description is omitted here.

In this embodiment, semantic analysis is performed on the light field image under other viewing angles to obtain a set of semantic segmentations of the individual i under the viewing angle V (s, t)

Step S305: and mapping the result of the focusing semantic segmentation corresponding to each individual to each view angle through reprojection according to the depth value range corresponding to each individual on the focusing image set so as to form mapping semantic segmentation.

In an embodiment of the present application, the specific method in step S305 includes:

A. and finding out the pixels of the light field image at different focusing depths in the focusing image set at the refocusing view angle for each pixel in the light field image at other target view angles.

In this embodiment, the method can obtain the approximate depth d of the individual having semantic information in the scene by the focusing semantic analysis described in step S302 _i And a rough positional relationship, thereby according to the individual depth d _i And acquiring semantic corresponding relations under different visual angles through reprojection. The reprojection conversion from the focusing view angle (preferably the center view angle) to the rest view angles is defined as H, and then the pixels of the center view angle corresponding to the pixels under the target view angle are:

B. and selecting the semantic information which has the smallest focusing depth and does not belong to the background as the semantic information of the current pixel under the target visual angle.

The process of reprojection is to find for each pixel in the target view its center view in the representation set Corresponding pixels p at different focusing depths _d We take the semantic value on the focus picture with the smallest focus depth and the semantics not belonging to the background classification as the semantic value of the current pixel. I.e.

Mapping the focusing semantic analysis result to each view angle through reprojection according to the depth of focusing pictures in the representation set as

Step S306: and obtaining the corresponding relation of each individual under each view angle through the similarity comparison of the original semantic segmentation and the mapping semantic segmentation.

In this embodiment, the collection is partitioned by the original semanticsWith mapping semantic segmentation set->And (5) obtaining the corresponding relation under each view angle through similarity comparison.

In an embodiment of the present application, the specific method in step S306 includes:

A. comparing the similarity comparison value of the original semantic segmentation and the mapping semantic segmentation with a preset value;

B. if the result of mapping semantic segmentation is smaller than the preset value, selecting the result of mapping semantic segmentation to represent the semantic segmentation result corresponding to the individual under each view angle;

C. according to the semantic segmentation result of each individual under each view angle, the corresponding relation of each individual under each view angle is obtained.

The similarity comparison value is as follows:

if it isWill- >Marked as->Representative individual->

Wherein the SIM is _thresh Is a preset value.

In an embodiment of the present application, the step S306 specifically further includes: the depth value range of each individual corresponds to a parallax range which can be used as a parallax search interval in stereo matching so as to reduce the calculation time of the stereo matching.

In the present embodiment, individuals at different viewing angles are treated as followsClassifying to obtain the relationship of the individuals under different visual angles. And clustering depth range +.>The corresponding parallax range is used as a parallax search section used in stereo matching.

In summary, the individual matching method based on the light field semantics mainly uses the light focusing semantics segmentation and the light field semantics to obtain the semantic individual corresponding relation under multiple visual angles, and limits the search range in the same area during stereo matching, thereby greatly shortening the matching calculation time.

As shown in fig. 6, a block diagram of an electronic device according to an embodiment of the application is shown. As shown, the apparatus 600 includes:

an acquisition module 601, configured to acquire a light field image set including different perspectives;

the processing module 602 is configured to select a light field image of any view to refocus at different depths to form a focused image set, and perform semantic analysis on each focused image to obtain focused semantic segmentation of different individuals; clustering the individuals belonging to the same class according to the result of each focusing semantic segmentation to obtain depth value ranges of each individual on different focusing images; carrying out semantic analysis on the light field images under other view angles to obtain original semantic segmentation of the individuals corresponding to different view angles; mapping the result of the focusing semantic segmentation corresponding to each individual to each view angle through reprojection according to the depth value range corresponding to each individual on the focusing image set so as to form mapping semantic segmentation; and obtaining the corresponding relation of each individual under each view angle through the similarity comparison of the original semantic segmentation and the mapping semantic segmentation.

It should be noted that, because the content of information interaction and execution process between the modules/units of the above-mentioned device is based on the same concept as the method embodiment of the present application, the technical effects brought by the content are the same as the method embodiment of the present application, and the specific content can be referred to the description in the foregoing illustrated method embodiment of the present application, which is not repeated herein.

It should be further noted that, it should be understood that the division of the modules of the above apparatus is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated. And these units may all be implemented in the form of software calls through the processing element; or can be realized in hardware; the method can also be realized in a form of calling software by a processing element, and the method can be realized in a form of hardware by a part of modules. For example, the processing module 602 may be a processing element that is set up separately, may be implemented in a chip of the above apparatus, or may be stored in a memory of the above apparatus in the form of program codes, and may be called by a processing element of the above apparatus to execute the functions of the above processing module 602. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.

For example, the modules above may be one or more integrated circuits configured to implement the methods above, such as: one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more microprocessors (digital signal processor, abbreviated as DSP), or one or more field programmable gate arrays (Field Programmable Gate Array, abbreviated as FPGA), or the like. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

As shown in fig. 7, a schematic structural diagram of an electronic device according to an embodiment of the application is shown. As shown, the electronic device 700 includes: a memory 701, a processor 702, and a communicator 703; the memory 701 is used for storing computer instructions; the processor 702 executes computer instructions to implement the method as described in fig. 2.

In some embodiments, the number of the memories 701 in the electronic device 700 may be one or more, the number of the processors 702 may be one or more, and the number of the communicators 703 may be one or more, and one is exemplified in fig. 7.

In an embodiment of the present application, the processor 702 in the electronic device 700 loads one or more instructions corresponding to the process of the application program into the memory 701 according to the steps described in fig. 2, and the processor 702 executes the application instructions stored in the memory 702, thereby implementing the method described in fig. 2.

In some embodiments, the external device to which the communicator 703 is communicatively connected may be a light field camera array.

The memory 701 may include a random access memory (Random Access Memory, abbreviated as RAM) or may include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The memory 701 stores an operating system and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for performing various operations. The operating system may include various system programs for implementing various underlying services and handling hardware-based tasks.

The processor 702 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

The communicator 703 is configured to implement a communication connection between the database access apparatus and other devices (e.g., clients, read-write libraries, and read-only libraries). The communicator 703 may comprise one or more sets of modules of different communication means, for example CAN communication modules communicatively coupled to a CAN bus. The communication connection may be one or more wired/wireless communication means and combinations thereof. The communication mode comprises the following steps: any one or more of the internet, CAN, intranet, wide Area Network (WAN), local Area Network (LAN), wireless network, digital Subscriber Line (DSL) network, frame relay network, asynchronous Transfer Mode (ATM) network, virtual Private Network (VPN), and/or any other suitable communication network. For example: any one or more of WIFI, bluetooth, NFC, GPRS, GSM, and ethernet.

In some specific applications, the various components of the electronic device 700 are coupled together by a bus system that may include a power bus, control bus, status signal bus, etc., in addition to a data bus. But for purposes of clarity of illustration the various buses are referred to in fig. 7 as a bus system.

In one embodiment of the application, the application provides a non-transitory computer readable storage medium having stored thereon computer instructions that when executed by a processor implement the method as described in fig. 2.

The computer-readable storage medium, as will be appreciated by one of ordinary skill in the art: embodiments of the system and the functions of the units may be implemented by means of hardware related to a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs an embodiment including the functions of the system and the units; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

In summary, the individual matching method, device, equipment and medium based on the light field semantics provided by the application are characterized in that a light field image set containing different visual angles is obtained; refocusing the light field images of any view angle at different depths to form a focusing image set, and performing semantic analysis on each focusing image to obtain focusing semantic segmentation of different individuals; clustering the individuals belonging to the same class according to the result of each focusing semantic segmentation to obtain depth value ranges of each individual on different focusing images; carrying out semantic analysis on the light field images under other view angles to obtain original semantic segmentation of the individuals corresponding to different view angles; mapping the result of the focusing semantic segmentation corresponding to each individual to each view angle through reprojection according to the depth value range corresponding to each individual on the focusing image set so as to form mapping semantic segmentation; and obtaining the corresponding relation of each individual under each view angle through the similarity comparison of the original semantic segmentation and the mapping semantic segmentation.

The application effectively overcomes various defects in the prior art and has high industrial utilization value.

The above embodiments are merely illustrative of the principles of the present application and its effectiveness, and are not intended to limit the application. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the application. Accordingly, it is intended that all equivalent modifications and variations of the application be covered by the claims, which are within the ordinary skill of the art, be included within the scope of the appended claims.

Claims

1. An individual matching method based on light field semantics, the method comprising:

acquiring a light field image set containing different visual angles;

refocusing the light field images of any view angle at different depths to form a focusing image set, and performing semantic analysis on each focusing image to obtain focusing semantic segmentation of different individuals; comprising the following steps: finding each individual according to target detection, and carrying out semantic marking and segmentation on each individual through a boundary box; calculating semantic confidence and focal power corresponding to each individual, and accordingly obtaining quality distribution of focusing semantic segmentation corresponding to the focusing image;

Clustering the individuals belonging to the same class according to the result of each focusing semantic segmentation to obtain depth value ranges of each individual on different focusing images, wherein the clustering comprises the following steps: optionally, roughly estimating the similarity of the two individuals according to the difference between the boundary frames and the difference between the depth values corresponding to the individuals so as to judge whether the two individuals belong to the same class; taking the depth value range of each individual on different focusing images as the depth value range of each clustered individual; selecting an individual with highest quality distribution in the individual clusters, obtaining semantic information corresponding to the individual through semantic analysis, and obtaining a depth value range corresponding to the individual as a result of focusing semantic segmentation of each individual;

carrying out semantic analysis on the light field images under other view angles to obtain original semantic segmentation of the individuals corresponding to different view angles;

mapping the result of the focusing semantic segmentation corresponding to each individual to each view angle through reprojection according to the depth value range corresponding to each individual on the focusing image set so as to form mapping semantic segmentation;

And obtaining the corresponding relation of each individual under each view angle through the similarity comparison of the original semantic segmentation and the mapping semantic segmentation.

2. The method of claim 1, wherein the set of light field images is comprised of multi-view images captured by an array of cameras corresponding to a scene.

3. The method of claim 1, wherein the refocusing is performed by selecting different focus depths based on average aliquots.

4. The method of claim 1, wherein the mapping the result of the focused semantic segmentation for each individual to each view angle by re-projection according to the depth value range for each individual on the set of focused images comprises:

finding out pixels of the focusing image set under the refocusing view angle under different focusing depths for each pixel in the light field image under other target view angles;

and selecting semantic information which has the smallest focusing depth and does not belong to the background as the semantic information of the current pixel under the target visual angle.

5. The method according to claim 1, wherein the method for obtaining the correspondence of each individual at each view angle by comparing the similarity of the original semantic segmentation and the mapped semantic segmentation comprises:

Comparing the similarity comparison value of the original semantic segmentation and the mapping semantic segmentation with a preset value;

if the result of mapping semantic segmentation is smaller than the preset value, selecting the result of mapping semantic segmentation to represent the semantic segmentation result corresponding to the individual under each view angle;

according to the semantic segmentation result of each individual under each view angle, the corresponding relation of each individual under each view angle is obtained.

6. The method of claim 5, wherein the range of depth values for each individual corresponds to a range of disparities that can be used as a disparity search interval for stereo matching to reduce the computation time for the stereo matching.

7. An electronic device, the device comprising:

the acquisition module is used for acquiring a light field image set containing different visual angles;

the processing module is used for selecting the light field image of any view angle to refocus at different depths to form a focusing image set, and carrying out semantic analysis on each focusing image to obtain focusing semantic segmentation of different individuals; comprising the following steps: finding each individual according to target detection, and carrying out semantic marking and segmentation on each individual through a boundary box; calculating semantic confidence and focal power corresponding to each individual, and accordingly obtaining quality distribution of focusing semantic segmentation corresponding to the focusing image;

carrying out semantic analysis on the light field images under other view angles to obtain original semantic segmentation of the individuals corresponding to different view angles; mapping the result of the focusing semantic segmentation corresponding to each individual to each view angle through reprojection according to the depth value range corresponding to each individual on the focusing image set so as to form mapping semantic segmentation; and obtaining the corresponding relation of each individual under each view angle through the similarity comparison of the original semantic segmentation and the mapping semantic segmentation.

8. An electronic device, the device comprising: a memory, a processor, and a communicator; the memory is used for storing computer instructions; the processor being operative to execute computer instructions to implement the method of any one of claims 1 to 6; the communicator communicates with a connected external device.

9. A non-transitory computer readable storage medium storing computer instructions which, when executed, perform the method of any one of claims 1 to 6.