WO2023134071A1

WO2023134071A1 - Person re-identification method and apparatus, electronic device and storage medium

Info

Publication number: WO2023134071A1
Application number: PCT/CN2022/089991
Authority: WO
Inventors: 郑喜民; 翟尤; 舒畅; 陈又新
Original assignee: 平安科技（深圳）有限公司
Priority date: 2022-01-12
Filing date: 2022-04-28
Publication date: 2023-07-20
Also published as: CN114359970A

Abstract

The present application relates to the technical field of artificial intelligence, and provides a person re-identification method and apparatus, an electronic device and a storage medium. The method comprises: inputting a first image sequence into a preset posture identification network so as to obtain a first local feature of each body part of a person to be identified; inputting the first image sequence into a preset multi-layer convolutional neural network so as to obtain a first global feature of said person; performing first fusion on the first local features of the plurality of body parts and the first global feature so as to obtain a second local feature of each body part; and inputting the second local features of the plurality of body parts of said person into a pre-trained person re-identification model, and outputting a person re-identification result. According to the present application, the accuracy of person re-identification is improved. The present application further relates to blockchain technology, and the first image sequence is stored in a blockchain node.

Description

Pedestrian re-identification method, device, electronic equipment and storage medium

This application claims the priority of the Chinese patent application with the application number 202210033877.5 filed on January 12, 2022, entitled "Pedestrian re-identification method, device, electronic equipment and storage medium", the entire content of which is incorporated by reference in this application.

technical field

The present application relates to the technical field of artificial intelligence, in particular to a pedestrian re-identification method, device, electronic equipment and storage medium.

Background technique

Person re-identification (Person re-identification), also known as pedestrian re-identification, is a technology that uses computer vision technology to determine whether a specific pedestrian exists in an image or video sequence. It is widely considered to be a sub-problem of image retrieval. The existing technology uses video The pose estimation in the method uses the pose estimation for each frame and then conducts pedestrian re-identification.

However, due to the complexity of human body movements, changes in illumination, background interference, etc., the inventors found that the existing technology cannot handle the occlusion problem well when performing gesture processing on each frame, resulting in low pedestrian re-identification accuracy.

Therefore, it is necessary to propose a method for fast and accurate person re-identification.

Contents of the invention

This application proposes a pedestrian re-identification method, device, electronic equipment, and storage medium. By dividing the pedestrian to be identified into multiple body parts and performing separate calculations for each body part, the accuracy of pedestrian re-identification is improved.

The first aspect of the present application provides a pedestrian re-identification method, the method comprising:

Obtaining a first image sequence of the pedestrian to be identified, inputting the first image sequence into a preset gesture recognition network, and obtaining the first local features of each body part of the pedestrian to be identified;

Inputting the first image sequence into a preset multi-layer convolutional neural network to obtain the first global feature of the pedestrian to be identified;

performing a first fusion of the first local features of each of the body parts of the pedestrian to be identified with the first global feature to obtain a second local feature of each of the body parts of the pedestrian to be identified;

inputting the second local features of each body part of the pedestrian to be identified into a pre-trained pedestrian re-identification model, and receiving a pedestrian re-identification result output by the pedestrian re-identification model, wherein the pedestrian The re-identification model contains multiple channel attention modules and multiple position attention modules.

A second aspect of the present application provides an electronic device, the electronic device includes a memory and a processor, the memory is used to store at least one computer-readable instruction, and the processor is used to execute the at least one computer-readable instruction to Implement the following steps:

A third aspect of the present application provides a computer-readable storage medium, the computer-readable storage medium stores at least one computer-readable instruction, and when the at least one computer-readable instruction is executed by a processor, the following steps are implemented:

A fourth aspect of the present application provides a pedestrian re-identification device, the device comprising:

An acquisition module, configured to acquire a first image sequence of a pedestrian to be identified, input the first image sequence into a preset gesture recognition network, and obtain a first local feature of each body part of the pedestrian to be identified;

A first input module, configured to input the first image sequence into a preset multi-layer convolutional neural network to obtain the first global feature of the pedestrian to be identified;

A fusion module, configured to fuse the first local features of each body part of the pedestrian to be identified with the first global feature for the first time to obtain the second feature of each body part of the pedestrian to be identified. local features;

The second input module is configured to input the second local features of each body part of the pedestrian to be identified into a pre-trained pedestrian re-identification model, and receive the pedestrian re-identification output from the pedestrian re-identification model As a result, the pedestrian re-identification model includes multiple channel attention modules and multiple position attention modules.

The pedestrian re-identification method, device, electronic equipment and storage medium described in this application improve the accuracy of pedestrian re-identification.

Description of drawings

FIG. 1 is a flow chart of a pedestrian re-identification method provided in Embodiment 1 of the present application.

FIG. 2 is a structural diagram of a pedestrian re-identification device provided in Embodiment 2 of the present application.

FIG. 3 is a schematic structural diagram of an electronic device provided in Embodiment 3 of the present application.

Detailed ways

In order to more clearly understand the above objects, features and advantages of the present application, the present application will be described in detail below in conjunction with the accompanying drawings and specific embodiments. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments can be combined with each other.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein in the specification of the application are only for the purpose of describing specific embodiments, and are not intended to limit the application.

Embodiment one

In this embodiment, the pedestrian re-identification method can be applied to electronic devices. For electronic devices that require pedestrian re-identification, the pedestrian re-identification function provided by the method of this application can be directly integrated on the electronic device, or It runs in the electronic device in the form of a software development kit (Software Development Kit, SDK).

The embodiments of the present application may acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes several major directions such as computer vision technology, robot technology, biometric technology, speech processing technology, natural language processing technology, machine learning, and deep learning.

As shown in FIG. 1 , the pedestrian re-identification method specifically includes the following steps. According to different requirements, the order of the steps in the flow chart can be changed, and some of them can be omitted.

S11. Obtain a first image sequence of a pedestrian to be identified, and input the first image sequence into a preset gesture recognition network to obtain a first local feature of each body part of the pedestrian to be identified, wherein the Pedestrians to be identified contain multiple body parts.

In this embodiment, the first image sequence refers to multiple consecutive frame images extracted from the captured video of the pedestrian to be identified.

It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned first image sequence, the above-mentioned first image sequence can also be stored in a block chain node.

In this embodiment, the posture recognition network can be preset, wherein the posture recognition network can be an AlphaPose model, and the AlphaPose model adopts the RMPE framework, which is composed of a symmetric space transformer network SSTN, a parameter posture non-maximum suppression PNMS, and a posture guidance suggestion The generator is composed of PGPG, and the AlphaPose model is a prior art, which will not be described in detail in this embodiment.

In an optional embodiment, the inputting the first image sequence into a preset gesture recognition network to obtain the first local features of each body part of the pedestrian to be recognized includes:

The first image sequence is input into a preset gesture recognition network, and each image in the first image sequence is detected in the preset gesture recognition network to extract the body parts of the pedestrian to be recognized;

Acquiring the first position coordinates and the first confidence level of each body part of the pedestrian to be identified;

Performing vector transformation on the first position coordinates and the first confidence level of each body part of the pedestrian to be identified, to obtain the first local feature of the corresponding body part of the pedestrian to be identified.

In this embodiment, when performing the first local feature extraction of the body parts of pedestrians to be identified, 18 body parts can be preset, for example, the body parts can include: nose, right eye, left eye, right ear, left Ear, right shoulder, left shoulder, right elbow, left elbow, right wrist, left wrist, right hip, left hip, right knee, left knee, right ankle, left ankle, neck, each body part of the pedestrian to be identified Calculated separately to extract the first local features corresponding to each body part.

In this embodiment, by dividing the body parts of the pedestrian to be recognized, and extracting the posture features of each body part, it is avoided that a certain body part of the pedestrian to be recognized is blocked, and the body part of the occluder is blocked. The fact that features are used as the features of the pedestrian to be identified ensures the accuracy of the extracted first local features of each body part of the pedestrian to be identified. In the subsequent process of pedestrian re-identification, the pedestrian to be identified is considered The first local features of each body part of the pedestrian, which in turn improves the accuracy of pedestrian re-identification.

S12. Input the first image sequence into a preset multi-layer convolutional neural network to obtain a first global feature of the pedestrian to be identified.

In this embodiment, a multi-layer convolutional neural network can be preset, and feature extraction is performed on each image in the first image sequence of the pedestrian to be identified through the preset multi-layer convolutional neural network, and then the obtained Describe the first global feature of the pedestrian to be recognized.

In an optional embodiment, the inputting the first image sequence into a preset multi-layer convolutional neural network to obtain the first global feature of the pedestrian to be identified includes:

The first image sequence is input into a preset deep residual network ResNet50 for human detection, and the first global feature of the pedestrian to be recognized is obtained.

In this embodiment, the residual learning is performed on each picture in the first image sequence through the convolutional layer in the deep residual network ResNet50, the residual network is easier to optimize, and the accuracy can be improved by depth, The deep residual network can solve the degradation problem of the deep network caused by increasing the depth, and the performance of the network can be improved by increasing the depth of the network, thereby improving the accuracy of the acquired first global feature.

S13. Perform the first fusion of the first local features of each body part of the pedestrian to be identified and the first global feature to obtain the second local features of each body part of the pedestrian to be identified.

In this embodiment, by fusing the first global feature of the pedestrian to be identified with the first local feature of each body part, the first local feature of each body part is made more accurate.

In an optional embodiment, the first local feature of each body part of the pedestrian to be identified is fused with the first global feature to obtain each body part of the pedestrian to be identified The second local features of the body part include:

calculating the product of the first local feature of each body part of the plurality of body parts of the pedestrian to be identified and the first global feature of the pedestrian to be identified, to obtain the first feature of the corresponding body part of the pedestrian to be identified Two local features.

S14. Input the second local features of each body part of the pedestrian to be identified into a pre-trained pedestrian re-identification model, and receive a pedestrian re-identification result output by the pedestrian re-identification model, wherein the The pedestrian re-identification model contains multiple channel attention modules and multiple location attention modules.

In this embodiment, the pedestrian re-identification model can be pre-trained, and after obtaining the second local features of multiple body parts of the pedestrian to be identified, the second local features are input into the pre-trained pedestrian re-identification model , to obtain the pedestrian re-identification result, wherein the pedestrian re-identification result includes: the pedestrian to be recognized in the first image sequence is the same pedestrian, or the pedestrian to be recognized in the first image sequence Not the same pedestrian.

Specifically, the pre-training process of the pedestrian re-identification model includes:

Obtaining a second image sequence of each pedestrian sample, where each pedestrian sample contains multiple body parts;

Inputting the second image sequence of each pedestrian sample into a preset posture recognition network to obtain second position coordinates and second confidence levels of multiple body parts of each pedestrian sample;

Obtaining a third local feature of the corresponding body part according to the second position coordinates and the second confidence level of each body part of each pedestrian sample;

Inputting the second image sequence of each pedestrian sample into a preset multi-layer convolutional neural network to obtain the second global feature of each pedestrian sample;

Performing a first fusion of the third local feature of each body part of each pedestrian sample with the first global feature to obtain a fourth local feature of each body part of each pedestrian sample;

Taking the fourth local features of multiple body parts of each pedestrian sample as the first sample data set;

Input the first sample data set into the channel attention module and the position attention module respectively for processing, and obtain the target channel attention result and the target position attention result of each pedestrian sample;

Perform a second fusion of the target channel attention result, target position attention result and the second global feature of each pedestrian sample to obtain the third global feature of each pedestrian sample;

using multiple third global features of the multiple pedestrian samples as a second sample data set;

dividing a training set and a test set from the second sample data set;

Inputting the training set into a preset neural network for training to obtain a pedestrian re-identification model;

Input the test set into the pedestrian re-identification model for testing, and calculate the pass rate of the test;

If the test pass rate is greater than or equal to a preset pass rate threshold, it is determined that the training of the pedestrian re-identification model is over; if the test pass rate is less than the preset pass rate threshold, the second sample data is updated , to obtain a new training set, and input the new training set into the preset neural network to retrain the pedestrian re-identification model.

In this embodiment, the second image sequence of each pedestrian sample can be obtained in advance, wherein most of the second image sequences contain multiple consecutive frame images, and the pedestrian re-identification is trained according to the multiple second image sequences of multiple pedestrian samples Model.

In this embodiment, the first sample data set refers to the first fusion of the third local features of multiple body parts of each pedestrian sample and the first global feature, which ensures that the input to the channel attention module and the effectiveness of sample images in the positional attention module.

In this embodiment, the second sample data set refers to a second fusion of the target channel attention result, the target position attention result and the second global feature of each pedestrian sample.

In this embodiment, the training set and the test set are divided from the second sample data set, and the division rules can be set in advance, for example, the training set accounts for 70% of the second sample data set, and the test set accounts for 70% of the second sample data set. 30%.

In this embodiment, the pass rate threshold can be set in advance, for example, the pass rate threshold can be set to 98%, when the test pass rate is greater than or equal to 98%, it is determined that the pedestrian re-identification model training is passed, and the training is ended; when the test pass rate is less than When 98%, update the second sample data set to increase the number of the second image sequence in the training set to obtain a new training set, and input the new training set into the preset neural network for training, and repeat the execution The above steps, until the test passing rate is greater than or equal to 98%.

Further, the first sample data set is respectively input into the channel attention module and the position attention module for processing, and the target channel attention result and target position attention result of each pedestrian sample are obtained. Results include:

Obtaining the fourth local features of each body part of each pedestrian sample from the first sample data set, wherein each body part corresponds to a channel attention module and a position attention module;

A plurality of fourth local features of multiple body parts of each pedestrian sample are respectively input into the corresponding channel attention module and the corresponding position attention module for weighting processing, and the channel of each body part of each pedestrian sample is obtained attention results and position attention results;

A first average value is calculated for multiple channel attention results of multiple body parts of each pedestrian sample, and the first average value is determined as the target channel attention result of the corresponding pedestrian, and multiple channel attention results for each pedestrian sample A second average value is calculated for multiple position attention results of the body parts, and the second average value is determined as the target position attention result corresponding to the pedestrian.

In this embodiment, the channel attention module and the position attention module are used to focus on meaningful features in the fourth local features of each body part of each pedestrian sample, wherein the channel attention module can use the global Average pooling and maximum pooling are two ways to obtain meaningful fourth local features in each body part of each pedestrian sample; the position attention module can be processed by maximum pooling and average pooling, and The maximum pooling and average pooling results are concatenated and input to the convolutional layer. After the weight coefficient is obtained based on the Sigmoid activation function, the meaningful fourth local feature in each body part of each pedestrian sample is determined.

In this embodiment, during the training process of the pedestrian re-identification model, when the fourth local feature of each body part of each pedestrian sample is obtained, for the fourth local feature of each body part of each pedestrian sample, respectively It is input to the channel attention module and the position attention module, and weighted according to the posture weight of each body part, so that the channel attention results and position attention results corresponding to each body part are more accurate, thereby ensuring that each Accuracy of Object Channel Attention Results and Object Location Attention Results for Pedestrian Samples.

In this embodiment, the first average value is obtained by averaging multiple channel attention results of multiple body parts of each pedestrian sample; It is obtained by averaging multiple location attention results for multiple body parts.

Further, the second fusion of the target channel attention result, the target position attention result and the second global feature of each pedestrian sample is carried out to obtain the third global feature of each pedestrian sample including:

Compute the product between the target channel attention result, the target position attention result and the second global feature for each pedestrian sample to obtain the third global feature for each pedestrian sample.

In this embodiment, the second fusion refers to combining the target channel attention result of each pedestrian sample with the target position attention result and multiplying it with the second global feature of each pedestrian sample to obtain A new feature emphasized by dual attention on parts, i.e. the third global feature for each pedestrian sample.

In this embodiment, the second sample data set used in the training process of the pedestrian re-identification model adopts the third global feature emphasized by the double attention of body parts, which ensures the accuracy of the features in the training set, so that the training The obtained pedestrian re-identification model is more optimized, thereby improving the accuracy of pedestrian re-identification.

In this embodiment, in the process of creating the pedestrian re-identification model, the third local feature of each body part of each pedestrian sample is calculated for each pedestrian sample, and by separately calculating each body part of each pedestrian sample, it can be Reducing the difficult samples with small class spacing can distinguish different pedestrians with similar appearances, thereby improving the accuracy of pedestrian re-identification. Recognizing each body part of a pedestrian for calculation can prevent these occluders from being added to the overall calculation, reduce the influence of occluders, and improve the accuracy of subsequent pedestrian re-identification.

In this embodiment, for the first image sequence of the pedestrian to be recognized, the first position coordinates and the first confidence level of each body part of the pedestrian to be recognized are acquired through a posture recognition network, and the first position coordinates and the first confidence level of each body part of the pedestrian to be recognized are Obtain the first local feature of each body part with a position coordinate and the first degree of confidence, and then obtain the first global feature of the pedestrian to be identified through a preset multi-layer convolutional neural network, and use the first global feature of each body part The local feature is multiplied by the first global feature of the pedestrian to be identified to obtain the second local feature of each body part, and the first local feature of each body part is input to the channel attention module and the position attention module respectively In the process, the target channel attention result and the target position attention result of the pedestrian to be identified are obtained, and the target channel attention result and the target position attention result of the pedestrian to be identified are combined and combined with the first pedestrian sample A global feature is multiplied to obtain the global feature of the pedestrian to be recognized that is emphasized by the double attention of body parts.

To sum up, the pedestrian re-identification method described in this embodiment, on the one hand, obtains the first image sequence of the pedestrian to be identified, and inputs the first image sequence into the preset gesture recognition network to obtain the described The first local feature of each body part of the pedestrian to be recognized avoids the situation that the feature of the occluder is used as the feature of the pedestrian to be recognized because a certain body part of the pedestrian to be recognized is blocked, ensuring the extraction The accuracy rate of the first local feature of each body part of the pedestrian to be identified; on the other hand, the first local feature and the first global feature of each body part of the pedestrian to be identified are performed for the first time fusion to obtain the second local features of each of the body parts of the pedestrian to be identified, so that the first local features of each body part are more accurate; finally, the first local features of the multiple body parts of the pedestrian to be identified are The two local features are input into the pre-trained pedestrian re-identification model. Since the pedestrian re-identification model contains multiple channel attention modules and multiple position attention modules, each body part of each pedestrian sample is The fourth local features of the input to the channel attention module and the position attention module respectively, weighted according to the weight of each body part's posture weight, so that the corresponding channel attention results and position attention results of each body part are more accurate , thus improving the accuracy of pedestrian re-identification.

Embodiment two

In some embodiments, the pedestrian re-identification device 20 may include a plurality of functional modules composed of program code segments. The program codes of each program segment in the pedestrian re-identification device 20 can be stored in the memory of the electronic device, and executed by the at least one processor to perform the pedestrian re-identification function (see FIG. 1 for details).

In this embodiment, the pedestrian re-identification device 20 can be divided into multiple functional modules according to the functions it performs. The functional modules may include: an acquisition module 201 , a first input module 202 , a fusion module 203 and a second input module 204 . The module referred to in this application refers to a series of computer-readable instruction segments that can be executed by at least one processor and can complete fixed functions, and are stored in a memory. In this embodiment, the functions of each module will be described in detail in subsequent embodiments.

An acquisition module 201, configured to acquire a first image sequence of a pedestrian to be identified, input the first image sequence into a preset gesture recognition network, and obtain a first local feature of each body part of the pedestrian to be identified, Wherein, the pedestrian to be identified includes multiple body parts.

The first input module 202 is configured to input the first image sequence into a preset multi-layer convolutional neural network to obtain the first global feature of the pedestrian to be identified.

The fusion module 203 is configured to fuse the first local features of each body part of the pedestrian to be identified with the first global feature for the first time to obtain the first local feature of each body part of the pedestrian to be identified. Two local features.

The second input module 204 is configured to input the second local features of each body part of the pedestrian to be identified into a pre-trained pedestrian re-identification model, and receive the pedestrian re-identification output from the pedestrian re-identification model. Recognition results, wherein the pedestrian re-identification model includes multiple channel attention modules and multiple position attention modules.

To sum up, the pedestrian re-identification device described in this embodiment, on the one hand, acquires the first image sequence of the pedestrian to be identified, and inputs the first image sequence into the preset gesture recognition network to obtain the described The first local feature of each body part of the pedestrian to be recognized avoids the situation that the feature of the occluder is used as the feature of the pedestrian to be recognized because a certain body part of the pedestrian to be recognized is blocked, ensuring the extraction The accuracy rate of the first local feature of each body part of the pedestrian to be identified; on the other hand, the first local feature and the first global feature of each body part of the pedestrian to be identified are performed for the first time fusion to obtain the second local features of each of the body parts of the pedestrian to be identified, so that the first local features of each body part are more accurate; finally, the first local features of the multiple body parts of the pedestrian to be identified are The two local features are input into the pre-trained pedestrian re-identification model. Since the pedestrian re-identification model contains multiple channel attention modules and multiple position attention modules, each body part of each pedestrian sample is The fourth local features of the input to the channel attention module and the position attention module respectively, weighted according to the weight of each body part's posture weight, so that the corresponding channel attention results and position attention results of each body part are more accurate , thus improving the accuracy of pedestrian re-identification.

Embodiment three

Referring to FIG. 3 , it is a schematic structural diagram of an electronic device provided by Embodiment 3 of the present application. In a preferred embodiment of the present application, the electronic device 3 includes a memory 31 , at least one processor 32 , at least one communication bus 33 and a transceiver 34 .

Those skilled in the art should understand that the structure of the electronic device shown in Figure 3 does not constitute a limitation of the embodiment of the present application, it can be a bus structure or a star structure, and the electronic device 3 can also include a ratio diagram more or less other hardware or software, or a different arrangement of components.

In some embodiments, the electronic device 3 is an electronic device that can automatically perform numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes but not limited to microprocessors, application-specific integrated circuits , programmable gate arrays, digital processors and embedded devices, etc. The electronic device 3 may also include a client device, which includes but is not limited to any electronic product that can interact with the client through a keyboard, mouse, remote control, touch pad, or voice-activated device, for example, Personal computers, tablets, smartphones, digital cameras, etc.

It should be noted that the electronic device 3 is only an example, and other existing or future electronic products that can be adapted to this application should also be included in the scope of protection of this application, and are included here by reference .

In some embodiments, the memory 31 is used to store program codes and various data, such as the pedestrian re-identification device 20 installed in the electronic device 3, and realize high-speed and automatic recognition during the operation of the electronic device 3 Complete program or data access. Described memory 31 comprises nonvolatile memory and volatile memory, such as read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), erasable programmable only memory Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), One-time Programmable Read-Only Memory (OTPROM), Electronically Erasable Programmable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory , EEPROM), CD-ROM (Compact Disc Read-Only Memory, CD-ROM) or other optical disk storage, disk storage, tape storage, or any other computer-readable medium that can be used to carry or store data.

In some embodiments, the at least one processor 32 may be composed of an integrated circuit, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions packaged, including a Or a combination of multiple central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, and various control chips. The at least one processor 32 is the control core (Control Unit) of the electronic device 3, and uses various interfaces and lines to connect the various components of the entire electronic device 3, by running or executing programs stored in the memory 31 or module, and call the data stored in the memory 31 to execute various functions of the electronic device 3 and process data.

In some embodiments, the at least one communication bus 33 is configured to realize connection and communication between the memory 31 and the at least one processor 32 and so on.

Although not shown, the electronic device 3 may also include a power supply (such as a battery) for supplying power to various components. Optionally, the power supply may be logically connected to the at least one processor 32 through a power management device, thereby Realize the functions of managing charging, discharging, and power consumption management. The power supply may also include one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, power status indicators and other arbitrary components. The electronic device 3 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.

It should be understood that the embodiments are only for illustration, and are not limited by the structure in the scope of the patent application.

The above-mentioned integrated units implemented in the form of software function modules can be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to make a computer device (which may be a personal computer, electronic device, or network device, etc.) or a processor (processor) execute the methods described in various embodiments of the present application part.

In a further embodiment, referring to FIG. 2 , the at least one processor 32 can execute the operating device of the electronic device 3 and various installed applications (such as the pedestrian re-identification device 20 ), program codes, etc. , for example, the various modules mentioned above.

Program codes are stored in the memory 31 , and the at least one processor 32 can invoke the program codes stored in the memory 31 to execute related functions. For example, the various modules described in FIG. 2 are program codes stored in the memory 31 and executed by the at least one processor 32, so as to implement the functions of the various modules to achieve the purpose of pedestrian re-identification.

In one embodiment of the present application, the memory 31 stores a plurality of computer-readable instructions, and the plurality of computer-readable instructions are executed by the at least one processor 32 to implement the pedestrian re-identification function.

Exemplarily, the program code may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 31 and executed by the processor 32 to complete this Apply. The one or more modules/units may be a series of computer-readable instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program in the electronic device 3 . For example, the program code can be divided into an acquisition module 201 , a first input module 202 , a fusion module 203 and a second input module 204 .

Specifically, for the specific implementation method of the above instructions by the at least one processor 32, reference may be made to the description of relevant steps in the embodiment corresponding to FIG. 1 , and details are not repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.

Further, the computer-readable storage medium may be non-volatile or volatile

Further, the computer-readable storage medium may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function, etc.; The data created using the node, etc.

The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain (Blockchain), essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

The modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, and may be located in one place or distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software function modules.

It will be apparent to those skilled in the art that the present application is not limited to the details of the exemplary embodiments described above, but that the present application can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application. Therefore, the embodiments should be regarded as exemplary and not restrictive in all points of view, and the scope of the application is defined by the appended claims rather than the foregoing description, and it is intended that the scope of the present application be defined by the appended claims rather than by the foregoing description. All changes within the meaning and range of equivalents of the elements are embraced in this application. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is clear that the word "comprising" does not exclude other elements or the singular does not exclude the plural. A plurality of units or means stated in this application can also be realized by software or hardware by one unit or means. The words first, second, etc. are used to denote names and do not imply any particular order.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application without limitation. Although the present application has been described in detail with reference to the preferred embodiments, those skilled in the art should understand that the technical solutions of the present application can be Make modifications or equivalent replacements without departing from the spirit and scope of the technical solutions of the present application.

Claims

A pedestrian re-identification method, wherein the method includes:

Acquiring a first image sequence of the pedestrian to be identified, inputting the first image sequence into a preset gesture recognition network, and obtaining the first local features of each body part of the pedestrian to be identified, wherein the to-be-identified Pedestrians contain multiple body parts;

Inputting the first image sequence into a preset multi-layer convolutional neural network to obtain the first global feature of the pedestrian to be identified;

performing a first fusion of the first local features of each of the body parts of the pedestrian to be identified with the first global feature to obtain a second local feature of each of the body parts of the pedestrian to be identified;

inputting the second local features of each body part of the pedestrian to be identified into a pre-trained pedestrian re-identification model, and receiving a pedestrian re-identification result output by the pedestrian re-identification model, wherein the pedestrian The re-identification model contains multiple channel attention modules and multiple position attention modules.
The pedestrian re-identification method according to claim 1, wherein said inputting said first image sequence into a preset gesture recognition network to obtain the first local features of each body part of said pedestrian to be recognized comprises :

The first image sequence is input into a preset gesture recognition network, and each image in the first image sequence is detected in the preset gesture recognition network to extract the body parts of the pedestrian to be recognized;

Acquiring the first position coordinates and the first confidence level of each body part of the pedestrian to be identified;

Performing vector transformation on the first position coordinates and the first confidence level of each body part of the pedestrian to be identified, to obtain the first local feature of the corresponding body part of the pedestrian to be identified.
The pedestrian re-identification method according to claim 1, wherein the first local feature and the first global feature of each of the body parts of the pedestrian to be identified are fused for the first time to obtain the The second local features of each said body part of the pedestrian include:

calculating the product of the first local feature of each body part of the plurality of body parts of the pedestrian to be identified and the first global feature of the pedestrian to be identified, to obtain the first feature of the corresponding body part of the pedestrian to be identified Two local features.
The pedestrian re-identification method according to claim 1, wherein, before inputting the second local features of each of the body parts of the pedestrian to be recognized into a pre-trained pedestrian re-identification model, the Methods also include:

Obtaining a second image sequence of each pedestrian sample, where each pedestrian sample contains multiple body parts;

Inputting the second image sequence of each pedestrian sample into a preset posture recognition network to obtain second position coordinates and second confidence levels of multiple body parts of each pedestrian sample;

Obtaining a third local feature of the corresponding body part according to the second position coordinates and the second confidence level of each body part of each pedestrian sample;

Inputting the second image sequence of each pedestrian sample into a preset multi-layer convolutional neural network to obtain the second global feature of each pedestrian sample;

Performing a first fusion of the third local feature of each body part of each pedestrian sample with the first global feature to obtain a fourth local feature of each body part of each pedestrian sample;

Taking the fourth local features of multiple body parts of each pedestrian sample as the first sample data set;

Input the first sample data set into the channel attention module and the position attention module respectively for processing, and obtain the target channel attention result and the target position attention result of each pedestrian sample;

Perform a second fusion of the target channel attention result, target position attention result and the second global feature of each pedestrian sample to obtain the third global feature of each pedestrian sample;

using multiple third global features of the multiple pedestrian samples as a second sample data set;

dividing a training set and a test set from the second sample data set;

Inputting the training set into a preset neural network for training to obtain a pedestrian re-identification model;

Input the test set into the pedestrian re-identification model for testing, and calculate the pass rate of the test;

If the test pass rate is greater than or equal to a preset pass rate threshold, it is determined that the training of the pedestrian re-identification model is over; if the test pass rate is less than the preset pass rate threshold, the second sample data is updated , to obtain a new training set, and input the new training set into the preset neural network to retrain the pedestrian re-identification model.
The pedestrian re-identification method according to claim 4, wherein said first sample data set is respectively input into said channel attention module and said position attention module for processing to obtain each pedestrian sample The target channel attention results and target position attention results of , include:

Obtaining the fourth local features of each body part of each pedestrian sample from the first sample data set, wherein each body part corresponds to a channel attention module and a position attention module;

A plurality of fourth local features of multiple body parts of each pedestrian sample are respectively input into the corresponding channel attention module and the corresponding position attention module for weighting processing, and the channel of each body part of each pedestrian sample is obtained attention results and position attention results;

A first average value is calculated for multiple channel attention results of multiple body parts of each pedestrian sample, and the first average value is determined as the target channel attention result of the corresponding pedestrian, and multiple channel attention results for each pedestrian sample A second average value is calculated for multiple position attention results of the body parts, and the second average value is determined as the target position attention result corresponding to the pedestrian.
The pedestrian re-identification method as claimed in claim 4, wherein, the target channel attention result, the target position attention result and the second global feature of each pedestrian sample are fused for the second time to obtain the pedestrian sample's The third global feature includes:

Compute the product between the target channel attention result, the target position attention result and the second global feature for each pedestrian sample to obtain the third global feature for each pedestrian sample.
The pedestrian re-identification method according to claim 1, wherein said inputting said first image sequence into a preset multi-layer convolutional neural network to obtain the first global feature of said pedestrian to be identified comprises:

The first image sequence is input into a preset deep residual network ResNet50 for human detection, and the first global feature of the pedestrian to be recognized is obtained.
An electronic device, wherein the electronic device includes a memory and a processor, the memory is used to store at least one computer-readable instruction, and the processor is used to execute the at least one computer-readable instruction to implement the following steps:

Acquiring a first image sequence of the pedestrian to be identified, inputting the first image sequence into a preset gesture recognition network, and obtaining the first local features of each body part of the pedestrian to be identified, wherein the to-be-identified Pedestrians contain multiple body parts;

Inputting the first image sequence into a preset multi-layer convolutional neural network to obtain the first global feature of the pedestrian to be identified;

performing a first fusion of the first local features of each of the body parts of the pedestrian to be identified with the first global feature to obtain a second local feature of each of the body parts of the pedestrian to be identified;

inputting the second local features of each body part of the pedestrian to be identified into a pre-trained pedestrian re-identification model, and receiving a pedestrian re-identification result output by the pedestrian re-identification model, wherein the pedestrian The re-identification model contains multiple channel attention modules and multiple position attention modules.
The electronic device according to claim 8, wherein the processor executes the at least one computer-readable instruction to implement the input of the first image sequence into a preset gesture recognition network, and obtain the to-be When identifying the first local features of each body part of a pedestrian, it specifically includes:

The first image sequence is input into a preset gesture recognition network, and each image in the first image sequence is detected in the preset gesture recognition network to extract the body parts of the pedestrian to be recognized;

Acquiring the first position coordinates and the first confidence level of each body part of the pedestrian to be identified;

Performing vector transformation on the first position coordinates and the first confidence level of each body part of the pedestrian to be identified, to obtain the first local feature of the corresponding body part of the pedestrian to be identified.
The electronic device according to claim 8, wherein said processor executes said at least one computer-readable instruction to implement said first partial feature and first When the global feature is fused for the first time to obtain the second local feature of each body part of the pedestrian to be identified, it specifically includes:

calculating the product of the first local feature of each body part of the plurality of body parts of the pedestrian to be identified and the first global feature of the pedestrian to be identified, to obtain the first feature of the corresponding body part of the pedestrian to be identified Two local features.
The electronic device according to claim 8, wherein, before inputting the second local features of each of the body parts of the pedestrian to be identified into a pre-trained pedestrian re-identification model, the processor Executing the at least one computer readable instruction is further to:

Obtaining a second image sequence of each pedestrian sample, where each pedestrian sample contains multiple body parts;

Inputting the second image sequence of each pedestrian sample into a preset posture recognition network to obtain second position coordinates and second confidence levels of multiple body parts of each pedestrian sample;

Obtaining a third local feature of the corresponding body part according to the second position coordinates and the second confidence level of each body part of each pedestrian sample;

Inputting the second image sequence of each pedestrian sample into a preset multi-layer convolutional neural network to obtain the second global feature of each pedestrian sample;

Performing a first fusion of the third local feature of each body part of each pedestrian sample with the first global feature to obtain a fourth local feature of each body part of each pedestrian sample;

Taking the fourth local features of multiple body parts of each pedestrian sample as the first sample data set;

Input the first sample data set into the channel attention module and the position attention module respectively for processing, and obtain the target channel attention result and the target position attention result of each pedestrian sample;

Perform a second fusion of the target channel attention result, target position attention result and the second global feature of each pedestrian sample to obtain the third global feature of each pedestrian sample;

using multiple third global features of the multiple pedestrian samples as a second sample data set;

dividing a training set and a test set from the second sample data set;

Inputting the training set into a preset neural network for training to obtain a pedestrian re-identification model;

Input the test set into the pedestrian re-identification model for testing, and calculate the pass rate of the test;

If the test pass rate is greater than or equal to a preset pass rate threshold, it is determined that the training of the pedestrian re-identification model is over; if the test pass rate is less than the preset pass rate threshold, the second sample data is updated , to obtain a new training set, and input the new training set into the preset neural network to retrain the pedestrian re-identification model.
The electronic device of claim 11 , wherein said processor executes said at least one computer readable instruction to implement said inputting said first sample data set into said channel attention module and said When processing in the position attention module to obtain the target channel attention result and target position attention result of each pedestrian sample, it specifically includes:

Obtaining the fourth local features of each body part of each pedestrian sample from the first sample data set, wherein each body part corresponds to a channel attention module and a position attention module;

A plurality of fourth local features of multiple body parts of each pedestrian sample are respectively input into the corresponding channel attention module and the corresponding position attention module for weighting processing, and the channel of each body part of each pedestrian sample is obtained attention results and position attention results;

A first average value is calculated for multiple channel attention results of multiple body parts of each pedestrian sample, and the first average value is determined as the target channel attention result of the corresponding pedestrian, and multiple channel attention results for each pedestrian sample A second average value is calculated for multiple position attention results of the body parts, and the second average value is determined as the target position attention result corresponding to the pedestrian.
The electronic device of claim 11 , wherein the processor executes the at least one computer readable instruction to implement the target channel attention result, the target location attention result and the second global When the features are fused for the second time to obtain the third global feature of each pedestrian sample, it specifically includes:

Compute the product between the target channel attention result, the target position attention result and the second global feature for each pedestrian sample to obtain the third global feature for each pedestrian sample.
A computer-readable storage medium, wherein the computer-readable storage medium stores at least one computer-readable instruction, and when the at least one computer-readable instruction is executed by a processor, the following steps are implemented:

Acquiring a first image sequence of the pedestrian to be identified, inputting the first image sequence into a preset gesture recognition network, and obtaining the first local features of each body part of the pedestrian to be identified, wherein the to-be-identified Pedestrians contain multiple body parts;

Inputting the first image sequence into a preset multi-layer convolutional neural network to obtain the first global feature of the pedestrian to be identified;

performing a first fusion of the first local features of each of the body parts of the pedestrian to be identified with the first global feature to obtain a second local feature of each of the body parts of the pedestrian to be identified;

inputting the second local features of each body part of the pedestrian to be identified into a pre-trained pedestrian re-identification model, and receiving a pedestrian re-identification result output by the pedestrian re-identification model, wherein the pedestrian The re-identification model contains multiple channel attention modules and multiple position attention modules.
The storage medium according to claim 14, wherein said at least one computer-readable instruction is executed by said processor to implement said inputting said first image sequence into a preset gesture recognition network to obtain said When the first local feature of each body part of the pedestrian is to be identified, it specifically includes:

The first image sequence is input into a preset gesture recognition network, and each image in the first image sequence is detected in the preset gesture recognition network to extract the body parts of the pedestrian to be recognized;

Acquiring the first position coordinates and the first confidence level of each body part of the pedestrian to be identified;

Performing vector transformation on the first position coordinates and the first confidence level of each body part of the pedestrian to be identified, to obtain the first local feature of the corresponding body part of the pedestrian to be identified.
The storage medium according to claim 14, wherein said at least one computer readable instruction is executed by said processor to implement said first partial feature and second partial feature of each said body part of said pedestrian to be identified When a global feature is fused for the first time to obtain the second local feature of each body part of the pedestrian to be identified, it specifically includes:

calculating the product of the first local feature of each body part of the plurality of body parts of the pedestrian to be identified and the first global feature of the pedestrian to be identified, to obtain the first feature of the corresponding body part of the pedestrian to be identified Two local features.
The storage medium according to claim 14, wherein, before inputting the second local features of each of the body parts of the pedestrian to be recognized into a pre-trained pedestrian re-identification model, the at least one The computer readable instructions are also used to implement the following steps when executed by the processor:

Obtaining a second image sequence of each pedestrian sample, where each pedestrian sample contains multiple body parts;

Inputting the second image sequence of each pedestrian sample into a preset posture recognition network to obtain second position coordinates and second confidence levels of multiple body parts of each pedestrian sample;

Obtaining a third local feature of the corresponding body part according to the second position coordinates and the second confidence level of each body part of each pedestrian sample;

Inputting the second image sequence of each pedestrian sample into a preset multi-layer convolutional neural network to obtain the second global feature of each pedestrian sample;

Performing a first fusion of the third local feature of each body part of each pedestrian sample with the first global feature to obtain a fourth local feature of each body part of each pedestrian sample;

Taking the fourth local features of multiple body parts of each pedestrian sample as the first sample data set;

The first sample data set is input into the channel attention module and the position attention module respectively for processing, and the target channel attention result and the target position attention result of each pedestrian sample are obtained;

Perform a second fusion of the target channel attention result, target position attention result and the second global feature of each pedestrian sample to obtain the third global feature of each pedestrian sample;

using multiple third global features of the multiple pedestrian samples as a second sample data set;

dividing a training set and a test set from the second sample data set;

Inputting the training set into a preset neural network for training to obtain a pedestrian re-identification model;

Input the test set into the pedestrian re-identification model for testing, and calculate the pass rate of the test;

If the test pass rate is greater than or equal to a preset pass rate threshold, it is determined that the training of the pedestrian re-identification model is over; if the test pass rate is less than the preset pass rate threshold, the second sample data is updated , to obtain a new training set, and input the new training set into the preset neural network to retrain the pedestrian re-identification model.
The storage medium of claim 17, wherein the at least one computer readable instruction is executed by the processor to implement the inputting the first sample data set to the channel attention module and the channel attention module, respectively. When processing in the position attention module mentioned above to obtain the target channel attention result and target position attention result of each pedestrian sample, it specifically includes:

Obtaining the fourth local features of each body part of each pedestrian sample from the first sample data set, wherein each body part corresponds to a channel attention module and a position attention module;

A plurality of fourth local features of multiple body parts of each pedestrian sample are respectively input into the corresponding channel attention module and the corresponding position attention module for weighting processing, and the channel of each body part of each pedestrian sample is obtained attention results and position attention results;

A first average value is calculated for multiple channel attention results of multiple body parts of each pedestrian sample, and the first average value is determined as the target channel attention result of the corresponding pedestrian, and multiple channel attention results for each pedestrian sample A second average value is calculated for multiple position attention results of the body parts, and the second average value is determined as the target position attention result corresponding to the pedestrian.
The storage medium of claim 17, wherein said at least one computer readable instruction is executed by said processor to implement said target channel attention result, target location attention result and second When the global feature is fused for the second time to obtain the third global feature of each pedestrian sample, it specifically includes:

Compute the product between the target channel attention result, the target position attention result and the second global feature for each pedestrian sample to obtain the third global feature for each pedestrian sample.
A pedestrian re-identification device, wherein the device includes:

An acquisition module, configured to acquire a first image sequence of a pedestrian to be identified, input the first image sequence into a preset gesture recognition network, and obtain a first local feature of each body part of the pedestrian to be identified, wherein , the pedestrian to be identified contains multiple body parts;

A first input module, configured to input the first image sequence into a preset multi-layer convolutional neural network to obtain the first global feature of the pedestrian to be identified;

A fusion module, configured to fuse the first local features of each body part of the pedestrian to be identified with the first global feature for the first time to obtain the second feature of each body part of the pedestrian to be identified. local features;

The second input module is configured to input the second local features of each body part of the pedestrian to be identified into a pre-trained pedestrian re-identification model, and receive the pedestrian re-identification output from the pedestrian re-identification model As a result, the pedestrian re-identification model includes multiple channel attention modules and multiple position attention modules.