CN110751205A

CN110751205A - Object association method, device, equipment and medium

Info

Publication number: CN110751205A
Application number: CN201910989414.4A
Authority: CN
Inventors: 曹获; 刘博�; 胡星
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Apollo Intelligent Technology Beijing Co Ltd
Priority date: 2019-10-17
Filing date: 2019-10-17
Publication date: 2020-02-04

Abstract

The embodiment of the application discloses an object association method, an object association device, object association equipment and an object association medium, relates to the field of data processing, particularly relates to the field of computer vision, and can be used in the field of unmanned driving. The specific implementation scheme is as follows: determining the characteristic representation of the object according to the image data and the non-image data of the object in the object pair to be matched; and determining whether the objects in the object pair to be matched are related or not according to the characteristic representation of the objects. The embodiment of the application provides an object association method, device, equipment and medium, and the object association accuracy is improved.

Description

Object association method, device, equipment and medium

Technical Field

The embodiment of the application relates to the field of data processing, in particular to the field of computer vision, and can be used in the field of unmanned driving. Specifically, the embodiment of the application provides an object association method, device, equipment and medium.

Background

With the development of computer vision technology, multi-target tracking is applied more and more widely in the field of video monitoring and the field of unmanned driving. In practical scenarios, multi-camera multi-target tracking is most widely applied, and target association is a problem that must be solved by each multi-camera multi-target tracking algorithm.

The traditional solutions mainly have three kinds:

(1) based on the information of the tracking result, a difference value of the position, velocity and acceleration information of the obstacle between the candidate matching pairs is calculated. And if all the difference values are smaller than the set difference value threshold value, judging that the association of the two candidate matching pairs is successful.

(2) Appearance-based feature method that compares similarity between images of two objects or similarity between feature maps of images, and considers the two as associable when the similarity is greater than a certain threshold

(3) And meanwhile, weighting the two kinds of information by utilizing the tracked information and the appearance information, comparing the weighted sum with a threshold value, and judging whether the two targets can be related or not.

The three schemes have different use scenes, and the problems and the defects brought by the schemes are different:

the application scenario of the scheme (1) has high precision requirement on positioning and tracking information, and cannot be completely calculated according to the address, so that a threshold value is required. But the threshold selection requirement is high.

In the scheme (2), two vehicles with close appearances can easily run on the same road, and the situation of no matching can easily occur if the position information of candidate matching is not increased.

The scheme (3) can effectively solve the problems of the schemes (1) and (2), but the weighting and the threshold are set through subjective experience, and no universal method can be used for quickly obtaining a group of thresholds for different scenes; meanwhile, the manually set threshold value is likely to not find the global optimal solution for judging the association, so that the performance of the algorithm is limited to a certain extent.

Disclosure of Invention

The embodiment of the application provides an object association method, device, equipment and medium, so as to improve the accuracy of object association.

The embodiment of the application provides an object association method, which comprises the following steps:

determining the characteristic representation of the object according to the image data and the non-image data of the object in the object pair to be matched;

and determining whether the objects in the object pair to be matched are related or not according to the characteristic representation of the objects.

The embodiment of the application can realize the following technical effects: determining a characteristic representation of the object by determining from the image data and non-image data of the object; and determining whether the objects in the object pairs to be matched are related or not according to the characteristic representation of the objects, thereby realizing the association of the objects.

And comparing and respectively determining the image data correlation degree and the non-image data correlation degree of the object, performing weighted fusion on the determined image data correlation degree and the non-image data correlation degree based on the fusion weight value, and determining whether the object in the object to be matched is related or not. According to the embodiment of the application, the process of weighted fusion is omitted, and the application of the fusion weight is avoided. Because subjective factors are introduced in the determination of the fusion weight, the method and the device avoid the influence of the subjective factors introduced in the weighted fusion process, and further improve the accuracy of object association.

Further, the determining the feature representation of the object according to the image data and the non-image data of the object in the object pair to be matched includes:

determining an image feature representation of the image data and a non-image feature representation of the non-image data;

stitching the image feature representation and the non-image feature representation;

and determining the characteristic representation of the object according to the splicing result.

Based on the technical characteristics, the embodiment of the application can realize the following effects: by determining an image feature representation of the image data and a non-image feature representation of the non-image data; stitching the image feature representation and the non-image feature representation; and determining the characteristic representation of the object according to the splicing result, so that the determined characteristic representation of the object comprises the image characteristic of the object and the non-image characteristic of the object. I.e. to achieve a relatively comprehensive representation of the object.

Further, the determining an image feature representation of the image data comprises:

extracting image semantic features of the image data;

converting the image semantic features into the image feature representation.

Based on the technical characteristics, the embodiment of the application can realize the following effects: extracting image semantic features of the image data; and converting the image semantic features into the image feature representation, thereby realizing the determination of the image feature representation with the image semantic features.

Further, the determining whether the object in the pair of objects to be matched is related according to the feature representation of the object includes:

inputting the characteristic representation of one object in the object pair to be matched into one network in the twin networks, and inputting the characteristic representation of the other object in the object pair to be matched into the other network in the twin networks, wherein the network structures of the two networks in the twin networks are the same, and the network parameters are shared;

and determining whether the object in the object pair to be matched is related or not according to the output result of the twin network.

Based on the technical characteristics, the embodiment of the application can realize the following effects: the method comprises the steps of inputting the feature representations of two objects in a pair of objects to be matched into two network structures in a twin network respectively, representing the two objects in the pair of objects to be matched by using the two network structures in the twin network respectively, and measuring the spatial similarity between the two objects in the pair of objects to be matched through the Manhattan distance, the Euclidean distance, the cosine similarity and the like, so that the accurate association of the objects in the pair of objects to be matched is realized.

Further, the non-image data includes:

at least one of position information, velocity information, acceleration information, size information, and camera parameters for acquiring an image of the object.

Based on the technical characteristics, the embodiment of the application can realize the following effects: by integrating information of multiple dimensions, whether the objects in the object pair to be matched are related or not is judged from a relatively comprehensive angle, and the accuracy of object-related judgment is improved.

An embodiment of the present application further provides an object associating apparatus, including:

the characteristic representation determining module is used for determining the characteristic representation of the object according to the image data and the non-image data of the object in the object pair to be matched;

and the correlation determination module is used for determining whether the object in the object pair to be matched is correlated according to the characteristic representation of the object.

Further, the feature representation determination module includes:

a first determining unit for determining an image feature representation of the image data and a non-image feature representation of the non-image data;

a feature stitching unit for stitching the image feature representation and the non-image feature representation;

and the second determining unit is used for determining the characteristic representation of the object according to the splicing result.

Further, the first determining unit is specifically configured to:

extracting image semantic features of the image data;

converting the image semantic features into the image feature representation.

Further, the correlation determination module includes:

the characteristic input unit is used for inputting the characteristic representation of one object in the object pair to be matched into one network in the twin networks and inputting the characteristic representation of the other object in the object to be matched into the other network in the twin networks, wherein the two networks in the twin networks have the same network structure and share network parameters;

and the correlation determination unit is used for determining whether the object in the object pair to be matched is correlated according to the output result of the twin network.

Further, the non-image data includes:

An embodiment of the present application further provides an electronic device, where the device includes:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments of the present application.

Embodiments of the present application also provide a non-transitory computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any of the embodiments of the present application.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a flowchart of an object association method according to a first embodiment of the present application;

FIG. 2 is a flow chart of an object association method according to a second embodiment of the present application;

FIG. 3 is a flow chart of a method for associating objects according to a third embodiment of the present application;

FIG. 4 is a schematic structural diagram of a twin network based on a hybrid input according to a fourth embodiment of the present application;

FIG. 5 is a schematic diagram of a network structure of a hybrid input model according to a fourth embodiment of the present application;

FIG. 6 is a schematic structural diagram of an object associating device according to a fifth embodiment of the present application;

fig. 7 is a block diagram of an electronic device of an object association method according to a sixth embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

First embodiment

Fig. 1 is a flowchart of an object association method according to a first embodiment of the present application. The embodiment can be applied to the case of correlating objects detected by different devices. Typically, the present embodiment is applicable to a case where objects detected by different cameras or different sensors are correlated. The method may be performed by an object association apparatus, which may be implemented by means of software and/or hardware. Referring to fig. 1, the object association method provided in this embodiment includes:

s110, determining the characteristic representation of the object according to the image data and the non-image data of the object in the object pair to be matched.

The object pair to be matched refers to two objects waiting to be associated.

The image data is an image of the object in the pair of objects to be matched. Specifically, an image of the object may be extracted from the captured image according to the position of the detection frame determined by the apparatus.

The non-image data is data describing a non-image form of the object in the pair of objects to be matched.

Specifically, the non-image data includes:

Wherein the position information comprises the size of the position and/or the direction of the position.

The speed information includes the magnitude of the speed and/or the direction of the speed.

The acceleration information includes the magnitude of the degree of driving and/or the direction of acceleration.

The size information includes at least one of a length, a width, and a height.

The camera parameters include camera internal parameters and/or camera external parameters.

The feature representation of the object refers to data representing the feature of the object.

Specifically, determining the feature representation of the object according to the image data and the non-image data of the object in the object pair to be matched comprises:

extracting object image features of the image data;

and splicing the object image characteristics and the non-image data to generate the characteristic representation of the object.

Wherein the object image feature is a feature of an object appearing in the image. The object image features data in a non-image form.

And S120, determining whether the object in the object pair to be matched is related according to the characteristic representation of the object.

And if the objects in the object pair to be matched are related, determining that the two objects in the object pair to be matched are the same object.

Specifically, determining whether the object in the pair of objects to be matched is related according to the feature representation of the object includes:

and comparing the characteristic representations of the objects, and determining whether the objects in the pair of the objects to be matched are related according to the comparison result.

Optionally, comparing the characteristic representation of the object comprises: and calculating the distance between the characteristic representations, and determining whether the objects in the pair of objects to be matched are related according to the calculated distance.

According to the technical scheme of the embodiment of the application, the characteristic representation of the object is determined according to the image data and the non-image data of the object; and determining whether the objects in the object pairs to be matched are related or not according to the characteristic representation of the objects, thereby realizing the association of the objects.

Second embodiment

Fig. 2 is a flowchart of an object association method according to a second embodiment of the present application. The present embodiment is an alternative proposed on the basis of the above-described embodiments. Referring to fig. 2, the object association method provided in this embodiment includes:

s210, determining image characteristic representation of the image data and non-image characteristic representation of the non-image data.

The image feature representation refers to data representing the feature of the object image data.

The non-image feature representation refers to data representing non-image data features of an object.

Specifically, the determining an image feature representation of the image data comprises:

extracting image semantic features of the image data;

converting the image semantic features into the image feature representation.

The image semantic features refer to semantic information described by the image data.

Based on the technical characteristics, the embodiment of the application can realize the following effects: extracting image semantic features of the image data; and converting the image semantic features into the image feature representation, thereby improving the representation accuracy of the object.

Typically, extracting image semantic features of the image data comprises:

and carrying out convolution operation on the image data to obtain image semantic features.

Converting the image semantic features into the image feature representation, including:

and inputting the image semantic features into a full-connection network, and outputting the image feature representation.

Determining a non-image feature representation of the non-image data, comprising:

inputting the non-image data into a fully connected network and outputting the non-image feature representation.

S220, splicing the image characteristic representation and the non-image characteristic representation.

And S230, determining the characteristic representation of the object according to the splicing result.

Specifically, determining the characteristic representation of the object according to the splicing result includes:

and inputting the splicing result into a full-connection network, and outputting the characteristic representation of the object.

S240, determining whether the object in the object pair to be matched is related according to the feature representation of the object.

According to the technical scheme of the embodiment of the application, image characteristic representation of the image data and non-image characteristic representation of the non-image data are determined; stitching the image feature representation and the non-image feature representation; and determining the characteristic representation of the object according to the splicing result, so that the determined characteristic representation of the object comprises the image characteristic of the object and the non-image characteristic of the object. I.e. to achieve a relatively comprehensive representation of the object.

Third embodiment

Fig. 3 is a flowchart of an object association method according to a third embodiment of the present application. The present embodiment is an alternative proposed on the basis of the above-described embodiments. Referring to fig. 3, the object association method provided in this embodiment includes:

s310, determining the characteristic representation of the object according to the image data and the non-image data of the object in the object pair to be matched.

S320, inputting the characteristic representation of one object in the object pair to be matched into one of the twin networks, and inputting the characteristic representation of the other object in the object to be matched into the other of the twin networks.

And the two networks in the twin network have the same network structure and the same network weight.

The twin network has two inputs which are fed into the two networks which respectively map the inputs to new spaces, forming a representation of the inputs in the new spaces. Through calculation of the loss function, the similarity of the two inputs is evaluated.

S330, determining whether the object in the object pair to be matched is related to the object according to the output result of the twin network.

According to the technical scheme of the embodiment of the application, the characteristic representations of the two objects in the object pair to be matched are respectively input into the two network structures in the twin network, the two network structures in the twin network are used for respectively representing the two objects in the object pair to be matched, and then the spatial similarity between the two objects in the object pair to be matched is measured through the Manhattan distance, the Euclidean distance or the cosine similarity, so that the accurate association of the objects in the object pair to be matched is realized.

Fourth embodiment

The present embodiment is an alternative provided by taking the feature expression as an example of a feature vector on the basis of the above-mentioned embodiments. The object association method provided by the embodiment comprises the following steps:

and acquiring image data and non-image data of the object to be matched.

Wherein the image data is an image of an object to be matched. The image is extracted according to the position of the detection frame.

The non-image data is the size and direction of the position of the object to be matched, the size and direction of the speed, the size and direction of the acceleration, the length, the width and the height, and the internal parameters and the external parameters of the camera for collecting the image of the object.

Referring to fig. 4, image data X1 and non-image data Y1 of the first object in the pair of objects to be matched are input into one of the twin networks, denoted as SN (X1, Y1); the image data X2 and the non-image data Y2 of the second object in the pair of objects to be matched are input into the other network of the twin networks, noted as SN (X2, Y2).

Wherein the twin network has the same calculation process for the inputs (X1, Y1) and (X2, Y2), i.e. SN (X1, Y1) and SN (X1, Y1) use the same weight, where W represents that the weights of the left and right networks are shared, i.e. the same.

Obtaining a feature vector of the first object through calculation of SN (X1, Y1); obtaining a feature vector of the second object through calculation of SN (X2, Y2);

a correlation (denoted as | | SN (X1, Y1) -SN (X2, Y2) | |) between the feature vector of the first object and the feature vector of the second object is calculated.

And determining that the first object and the second object are related when the correlation is greater than a set correlation threshold, and otherwise, that the first object and the second object are not related.

Wherein SN (X1, Y1) and SN (X2, Y2) are mixed input models, see fig. 5, the implementation of the model network structure is described as follows:

and carrying out convolution operation on the image data of the object to be matched to obtain the semantic features of the image.

And inputting the semantic features of the image into a full-connection network, and outputting an image feature vector.

And inputting the non-image data into the full-connection network and outputting the non-image feature vector.

And splicing the image characteristic vector and the non-image characteristic vector, inputting a splicing result into a full-connection network, and outputting the characteristic vector of the object to be matched.

According to the embodiment of the application, the image data and the non-image data of the object to be matched are utilized, a correlation result is given from a more comprehensive angle, and the correlation quality is guaranteed. On the other hand, the embodiment of the application automatically learns a high latitude mapping mode and a weighting mode for the features in a network mode, and gives a more accurate judgment result at a higher feature latitude.

Fifth embodiment

Fig. 6 is a schematic structural diagram of an object association device according to a fifth embodiment of the present application. The object associating apparatus 600 provided in the present embodiment includes: a feature representation determination module 601 and a correlation determination module 602.

The characteristic representation determining module 601 is configured to determine a characteristic representation of an object according to image data and non-image data of the object in a pair of objects to be matched;

a correlation determination module 602, configured to determine whether the object in the pair of objects to be matched is correlated according to the feature representation of the object.

Further, the feature representation determination module includes:

Further, the first determining unit is specifically configured to:

extracting image semantic features of the image data;

converting the image semantic features into the image feature representation.

Further, the correlation determination module includes:

the characteristic input unit is used for inputting the characteristic representation of one object in the object pair to be matched into one network in the twin network, and inputting the characteristic representation of the other object in the object to be matched into the other network in the twin network, wherein the two networks in the twin network have the same network structure and the same network weight;

Further, the non-image data includes:

Sixth embodiment

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 7 is a block diagram of an electronic device according to an object association method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 7, the electronic apparatus includes: one or more processors 701, a memory 702, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 7, one processor 701 is taken as an example.

The memory 702 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the object association methods provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the object association method provided herein.

The memory 702, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the object association methods in the embodiments of the present application (e.g., the feature representation determining module 601 and the correlation determining module 602 shown in fig. 6). The processor 701 executes various functional applications of the server and data processing by executing non-transitory software programs, instructions, and modules stored in the memory 702, that is, implements the object association method in the above-described method embodiment.

The memory 702 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the object-related electronic device, and the like. Further, the memory 702 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 702 optionally includes memory located remotely from the processor 701, which may be connected to an object associated electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the object association method may further include: an input device 703 and an output device 704. The processor 701, the memory 702, the input device 703 and the output device 704 may be connected by a bus or other means, and fig. 7 illustrates an example of a connection by a bus.

The input device 703 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an object associated electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 704 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An object association method, comprising:

2. The method of claim 1, wherein determining the feature representation of the object from the image data and non-image data of the object in the pair of objects to be matched comprises:

3. The method of claim 2, wherein the determining an image feature representation of the image data comprises:

extracting image semantic features of the image data;

converting the image semantic features into the image feature representation.

4. The method according to claim 1, wherein the determining whether the object in the pair of objects to be matched is related according to the feature representation of the object comprises:

inputting the characteristic representation of one object in the object pair to be matched into one network in the twin network, and inputting the characteristic representation of the other object in the object to be matched into the other network in the twin network, wherein the network structures of the two networks in the twin network are the same, and the network weights are the same;

5. The method of claim 1, wherein the non-image data comprises: at least one of position information, velocity information, acceleration information, size information, and camera parameters for acquiring an image of the object.

6. An object association apparatus, comprising:

7. The apparatus of claim 6, wherein the feature representation determination module comprises:

8. The apparatus according to claim 7, wherein the first determining unit is specifically configured to:

extracting image semantic features of the image data;

converting the image semantic features into the image feature representation.

9. The apparatus of claim 6, wherein the correlation determination module comprises:

10. The apparatus of claim 6, wherein the non-image data comprises:

11. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.