CN115953803A

CN115953803A - Training method and device for human body recognition model

Info

Publication number: CN115953803A
Application number: CN202211644144.1A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Current assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date: 2022-12-20
Filing date: 2022-12-20
Publication date: 2023-04-11

Abstract

The disclosure relates to the technical field of machine learning, and provides a training method and device for a human body recognition model. The method comprises the following steps: constructing a double-flow effective attention network and a multi-stage effective attention network, connecting the double-flow effective attention network and the multi-stage effective attention network by taking a residual error neural network as a backbone network, and obtaining a human body identification model; acquiring a human body training data set, and inputting training samples in the human body training data set into a human body recognition model; outputting a plurality of stage characteristics through a plurality of stage networks of the residual error neural network respectively; performing double-flow interactive calculation on the characteristics of each stage by using a double-flow effective attention network to obtain interactive characteristics corresponding to the characteristics of each stage; processing each stage characteristic and the interactive characteristic corresponding to the stage characteristic by using a multi-stage effective attention network to obtain a significant characteristic corresponding to each stage characteristic; and finishing training the human body recognition model by using the loss function based on the significant features corresponding to the features of each stage.

Description

Training method and device for human body recognition model

Technical Field

The disclosure relates to the technical field of machine learning, in particular to a training method and a training device for a human body recognition model.

Background

In actual human body recognition, a picture to be recognized is often shielded, but the condition that the picture is shielded in the training of the human body recognition model is not considered enough, so that the problem that the picture accuracy rate of the shielded condition in the recognition of the trained human body recognition model is low is caused. For example, in the prior art, only pictures with occlusion conditions are used in training, the model structure is not improved according to the features of the occlusion pictures, and a training method for the pictures with the occlusion conditions and the models with the improved structures is not provided.

In the course of implementing the disclosed concept, the inventors found that there are at least the following technical problems in the related art: the problem that the accuracy rate of pictures with shielding conditions is low in recognition of human body recognition models trained based on the traditional model training method.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide a training method and apparatus for a human body recognition model, an electronic device, and a computer-readable storage medium, so as to solve the problem in the prior art that the image accuracy of the human body recognition model trained based on the traditional model training method is low when the image is identified with an occlusion condition.

In a first aspect of the embodiments of the present disclosure, a training method for a human body recognition model is provided, including: constructing a double-flow effective attention network and a multi-stage effective attention network, connecting the double-flow effective attention network and the multi-stage effective attention network by taking a residual error neural network as a backbone network, and obtaining a human body identification model; acquiring a human body training data set, and inputting training samples in the human body training data set into a human body recognition model, wherein the training samples are marked with a first preset number of effective areas and a second preset number of ineffective areas; outputting a plurality of stage characteristics through a plurality of stage networks of a residual error neural network respectively; performing double-flow interactive calculation on the characteristics of each stage by using a double-flow effective attention network to obtain interactive characteristics corresponding to the characteristics of each stage; processing each stage characteristic and the interactive characteristic corresponding to the stage characteristic by using a multi-stage effective attention network to obtain a significant characteristic corresponding to each stage characteristic; and completing the training of the human body recognition model by using the loss function based on the significant features corresponding to the features of each stage.

In a second aspect of the embodiments of the present disclosure, there is provided a training apparatus for human body recognition models, including: the construction module is configured to construct a double-flow effective attention network and a multi-stage effective attention network, and the double-flow effective attention network and the multi-stage effective attention network are connected by taking a residual error neural network as a main network to obtain a human body identification model; the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is configured to acquire a human body training data set and input training samples in the human body training data set into a human body recognition model, and a first preset number of effective areas and a second preset number of ineffective areas are marked on the training samples; a first processing module configured to output a plurality of stage features through a plurality of stage networks of a residual neural network, respectively; the second processing module is configured to perform double-flow interactive calculation on each stage feature by using a double-flow effective attention network to obtain an interactive feature corresponding to each stage feature; the third processing module is configured to process each stage feature and the interactive feature corresponding to the stage feature by using a multi-stage effective attention network to obtain a significant feature corresponding to each stage feature; and the training module is configured to complete the training of the human body recognition model by using the loss function based on the significant features corresponding to the features of each stage.

In a third aspect of the embodiments of the present disclosure, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the above method when executing the computer program.

In a fourth aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, which stores a computer program, which when executed by a processor, implements the steps of the above-mentioned method.

Compared with the prior art, the embodiment of the disclosure has the following beneficial effects: the double-flow effective attention network and the multi-stage effective attention network are constructed, the residual error neural network is used as a main network, and the double-flow effective attention network and the multi-stage effective attention network are connected to obtain the human body identification model; acquiring a human body training data set, and inputting training samples in the human body training data set into a human body recognition model, wherein the training samples are marked with a first preset number of effective areas and a second preset number of ineffective areas; outputting a plurality of stage characteristics through a plurality of stage networks of a residual error neural network respectively; performing double-flow interactive calculation on the characteristics of each stage by using a double-flow effective attention network to obtain interactive characteristics corresponding to the characteristics of each stage; processing each stage feature and the interactive feature corresponding to the stage feature by using a multi-stage effective attention network to obtain a significant feature corresponding to each stage feature; based on the significant features corresponding to the features of each stage, the loss function is utilized to complete the training of the human body recognition model, so that the technical means can solve the problem that in the prior art, the accuracy rate of recognizing the image with the shielding condition of the human body recognition model trained based on the traditional model training method is low, and further improve the accuracy rate of recognizing the image with the shielding condition of the model.

Drawings

To more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive efforts.

FIG. 1 is a scenario diagram of an application scenario of an embodiment of the present disclosure;

FIG. 2 is a flowchart illustrating a training method of a human body recognition model according to an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of a training apparatus for human body recognition models according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.

A method and an apparatus for training a human body recognition model according to an embodiment of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 is a scene schematic diagram of an application scenario of an embodiment of the present disclosure. The application scenario may include

terminal devices

101, 102, and 103, server 104, and network 105.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, and 103 are hardware, they may be various electronic devices having a display screen and supporting communication with the server 104, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like; when the

terminal apparatuses

101, 102, and 103 are software, they can be installed in the electronic apparatus as above. The

terminal devices

101, 102, and 103 may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not limited by the embodiments of the present disclosure. Further, various applications, such as data processing applications, instant messaging tools, social platform software, search-type applications, shopping-type applications, etc., may be installed on the

terminal devices

101, 102, and 103.

The server 104 may be a server providing various services, for example, a backend server receiving a request sent by a terminal device establishing a communication connection with the server, and the backend server may receive and analyze the request sent by the terminal device and generate a processing result. The server 104 may be a server, may also be a server cluster composed of a plurality of servers, or may also be a cloud computing service center, which is not limited in this disclosure.

The server 104 may be hardware or software. When the server 104 is hardware, it may be various electronic devices that provide various services to the

terminal devices

101, 102, and 103. When the server 104 is software, it may be multiple software or software modules providing various services for the

terminal devices

101, 102, and 103, or may be a single software or software module providing various services for the

terminal devices

101, 102, and 103, which is not limited by the embodiment of the present disclosure.

The network 105 may be a wired network connected by a coaxial cable, a twisted pair cable, and an optical fiber, or may be a wireless network that can interconnect various Communication devices without wiring, for example, bluetooth (Bluetooth), near Field Communication (NFC), infrared (Infrared), and the like, which is not limited in the embodiment of the present disclosure.

A user can establish a communication connection with the server 104 via the network 105 through the

terminal apparatuses

101, 102, and 103 to receive or transmit information or the like. It should be noted that the specific types, numbers and combinations of the

terminal devices

101, 102 and 103, the server 104 and the network 105 may be adjusted according to the actual requirements of the application scenario, and the embodiment of the present disclosure does not limit this.

Fig. 2 is a schematic flowchart of a training method for a human body recognition model according to an embodiment of the present disclosure. The training method of the human body recognition model of fig. 2 may be performed by the computer or server of fig. 1, or by software on the computer or server. As shown in fig. 2, the training method of the human body recognition model includes:

s201, constructing a double-flow effective attention network and a multi-stage effective attention network, connecting the double-flow effective attention network and the multi-stage effective attention network by taking a residual error neural network as a backbone network, and obtaining a human body identification model;

s202, acquiring a human body training data set, and inputting training samples in the human body training data set into a human body recognition model, wherein the training samples are marked with a first preset number of effective areas and a second preset number of ineffective areas;

s203, outputting a plurality of stage characteristics through a plurality of stage networks of the residual error neural network respectively;

s204, performing double-flow interactive calculation on the characteristics of each stage by using a double-flow effective attention network to obtain interactive characteristics corresponding to the characteristics of each stage;

s205, processing each stage feature and the interactive feature corresponding to the stage feature by using a multi-stage effective attention network to obtain a significant feature corresponding to each stage feature;

and S206, completing training of the human body recognition model by using the loss function based on the significant features corresponding to the features of each stage.

The residual error neural network sequentially comprises a zeroth-stage network, a first-stage network, a second-stage network, a third-stage network and a fourth-stage network. The residual neural network, such as ResNet50, includes a zeroth Stage network Stage0, a first Stage network Stage1, a second Stage network Stage2, a third Stage network Stage3, and a fourth Stage network Stage4. The multiple-stage network of the residual neural network in the embodiment of the present disclosure refers to the second-stage network, the third-stage network, and the fourth-stage network.

The method for obtaining the human body recognition model includes that a residual error neural network is used as a backbone network to connect a double-flow effective attention network and a multi-stage effective attention network, the double-flow effective attention network and the multi-stage effective attention network can be connected in sequence after a second-stage network, a third-stage network and a fourth-stage network, the output of each multi-stage effective attention network is input into the next stage network of the stage networks corresponding to the multi-stage effective attention network, for example, the output of the multi-stage effective attention network connected behind the second-stage network is input into the third-stage network, and the fourth-stage network does not have the stage network thereafter, so the output of the connected multi-stage effective attention network does not need to be input into other stage networks.

It should be noted that the human body training data set includes a plurality of training samples, and each training sample has a plurality of corresponding stage features, interactive features, and salient features.

The trained human body recognition model can be used for various tasks such as human body recognition, human body weight recognition, human body retrieval, human body comparison, human body tracking, human body filing and the like.

According to the technical scheme provided by the embodiment of the disclosure, a double-flow effective attention network and a multi-stage effective attention network are constructed, a residual error neural network is used as a backbone network, and the double-flow effective attention network and the multi-stage effective attention network are connected to obtain a human body identification model; acquiring a human body training data set, and inputting training samples in the human body training data set into a human body recognition model, wherein the training samples are marked with a first preset number of effective areas and a second preset number of ineffective areas; outputting a plurality of stage characteristics through a plurality of stage networks of the residual error neural network respectively; performing double-flow interactive calculation on the characteristics of each stage by using a double-flow effective attention network to obtain interactive characteristics corresponding to the characteristics of each stage; processing each stage feature and the interactive feature corresponding to the stage feature by using a multi-stage effective attention network to obtain a significant feature corresponding to each stage feature; based on the significant features corresponding to the features of each stage, the loss function is utilized to complete the training of the human body recognition model, so that the technical means can solve the problem that in the prior art, the accuracy rate of recognizing the image with the shielding condition of the human body recognition model trained based on the traditional model training method is low, and further improve the accuracy rate of recognizing the image with the shielding condition of the model.

Outputting a plurality of stage features through a plurality of stage networks of a residual neural network, respectively, including: outputting a first-stage characteristic, a second-stage characteristic and a third-stage characteristic through a second-stage network, a third-stage network and a fourth-stage network of the residual error neural network respectively; the residual error neural network comprises a zeroth-stage network, a first-stage network, a second-stage network, a third-stage network and a fourth-stage network.

Performing double-flow interactive calculation on each stage characteristic by using a double-flow effective attention network to obtain an interactive characteristic corresponding to each stage characteristic, wherein the method comprises the following steps: for each stage feature: sequentially carrying out deformable convolution calculation, batch normalization processing and first activation processing on the stage characteristics to obtain first characteristics corresponding to the stage characteristics; sequentially carrying out cavity convolution calculation, batch normalization processing and second activation processing on the stage characteristics to obtain second characteristics corresponding to the stage characteristics; sequentially carrying out deformable convolution calculation, batch normalization processing and first activation processing on the first features to obtain third features corresponding to the stage features; sequentially performing feature stacking processing, cavity convolution calculation, batch normalization processing and second activation processing on the first feature and the second feature to obtain a fourth feature corresponding to the feature at the stage; sequentially performing feature stacking processing, deformable convolution calculation, batch normalization processing and first activation processing on the third feature and the fourth feature to obtain a fifth feature corresponding to the feature at the stage; carrying out average pooling treatment and full-connection layer treatment on the fifth characteristics in sequence to obtain sixth characteristics corresponding to the characteristics at the stage; sequentially carrying out deformable convolution calculation, batch normalization processing and first activation processing on the fourth features to obtain seventh features corresponding to the features at the stage; and sequentially performing feature stacking processing, convolution calculation and third activation processing on the sixth feature and the seventh feature to obtain the interactive feature corresponding to the stage feature.

The first activation processing may be processing using a relu activation function, the second activation processing may be processing using a hash activation function, and the third activation processing may be processing using a sigmoid activation function. The feature stacking process, also called a feature stitching process, is a process of stitching two or more features together to obtain a feature. The full connection layer processing is through a full connection layer network.

As mentioned above, the dual-stream interaction calculation refers to processing the first feature and the second feature corresponding to each stage feature for multiple times and performing multiple interactions.

Processing each stage feature and the interactive feature corresponding to the stage feature by using the multi-stage active attention network to obtain the significant feature corresponding to each stage feature, wherein the significant feature comprises: and sequentially performing feature stacking processing, deformable convolution calculation, batch normalization processing and first activation processing on each stage feature and the stage feature to obtain the significant feature corresponding to each stage feature.

Based on the corresponding significant features of each stage feature, the training of the human body recognition model is completed by using a loss function, and the training comprises the following steps: based on the significant features corresponding to the features of each stage, finishing training of the effective region and the ineffective region of the human body recognition model recognition picture by utilizing a first loss function and a second loss function; based on the significant features corresponding to the features of each stage, training a target object in a human body recognition model recognition picture is completed by utilizing a third loss function; the training of the effective area of the human body recognition model recognition picture and the training of the target object in the human body recognition model recognition picture are carried out simultaneously.

Based on the corresponding significant features of each stage feature, training of the effective region and the ineffective region of the human body recognition model recognition picture is completed by utilizing the first loss function and the second loss function, and the training comprises the following steps: determining a first part of features corresponding to the marked effective region on the training sample on the significant features corresponding to the features of each stage; determining a second part of features corresponding to the marked invalid region on the training sample on the significant features corresponding to the features of each stage; based on the first partial characteristics corresponding to the characteristics of each stage, training the effective region of the human body recognition model recognition picture by utilizing a first loss function; based on the second partial characteristics corresponding to the characteristics of each stage, training the invalid region of the human body recognition model recognition picture is completed by using a second loss function; the training of the effective area of the human body recognition model recognition picture and the training of the ineffective area of the human body recognition model recognition picture are carried out simultaneously.

The features in the present disclosure can be understood as a feature map, where a salient feature is a feature map, and the first partial feature corresponding to the salient feature is data of a plurality of points on the feature map, where the number of points is equal to a first preset number. The first partial feature corresponding to each stage feature is a first partial feature on the salient feature corresponding to each stage feature. The same applies to the second part of the features.

In an alternative embodiment, the method comprises: based on the probability that the first part of features corresponding to the features of each stage are recognized by the human body recognition model and are features corresponding to effective regions in the training sample, training of the effective regions of the recognition pictures of the human body recognition model is completed by utilizing a first loss function; and based on the probability that the second part of features corresponding to the features of each stage are recognized by the human body recognition model as the features corresponding to the invalid regions in the training sample, completing the training of recognizing the invalid regions of the pictures by the human body recognition model by utilizing a second loss function.

Determining that the first partial features and the second partial features corresponding to the features of each stage are completed according to the labeling information of the training samples; and the human body recognition model also judges the probability that the first part of features corresponding to the features of each stage are the features corresponding to the effective region in the training sample. The same is true for the second partial feature.

The first loss function loss1 is as follows:

loss1＝-Σlog(p _i )

at this time p _i And the first part of features corresponding to the ith stage features are the probabilities of the features corresponding to the effective regions in the training samples.

loss2＝-Σlog(1-p _i )

At this time p _i And the second part of features corresponding to the ith stage features are the probabilities of the features corresponding to the effective regions in the training samples.

The third loss function is the loss function of the residual neural network itself. The third loss function is to identify the target object in the picture for the human body recognition model. It should be noted that, in addition to the first preset number of valid regions and the second preset number of invalid regions, the training sample is also labeled with the identifier of the target object in the training sample.

It should be noted that the training of the valid area and the invalid area of the human body recognition image and the training of the target object in the human body recognition image are related. Training the human body recognition model to recognize the target object in the picture actually trains the human body recognition model, and the target object in the picture is recognized based on the effective region and the ineffective region of the picture.

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.

Fig. 3 is a schematic diagram of a training apparatus for human body recognition models according to an embodiment of the present disclosure. As shown in fig. 3, the training apparatus for a human recognition model includes:

the building module 301 is configured to build a double-flow effective attention network and a multi-stage effective attention network, and connect the double-flow effective attention network and the multi-stage effective attention network by using a residual neural network as a backbone network to obtain a human body identification model;

an obtaining module 302, configured to obtain a human body training data set, and input training samples in the human body training data set into a human body recognition model, where the training samples are labeled with a first preset number of valid regions and a second preset number of invalid regions;

a first processing module 303 configured to output a plurality of stage features through a plurality of stage networks of the residual neural network, respectively;

the second processing module 304 is configured to perform double-flow interactive calculation on each stage feature by using a double-flow effective attention network to obtain an interactive feature corresponding to each stage feature;

a third processing module 305, configured to process each stage feature and the interactive feature corresponding to the stage feature by using a multi-stage active attention network, to obtain a significant feature corresponding to each stage feature;

and a training module 306 configured to complete training of the human body recognition model by using the loss function based on the significant features corresponding to the features of each stage.

The method comprises the steps of connecting a double-flow effective attention network and a multi-stage effective attention network by taking a residual error neural network as a backbone network to obtain a human body recognition model, wherein the double-flow effective attention network and the multi-stage effective attention network can be sequentially connected after a second-stage network, a third-stage network and a fourth-stage network, the output of each multi-stage effective attention network is input into the next stage network of the stage network corresponding to the multi-stage effective attention network, for example, the output of the multi-stage effective attention network connected behind the second-stage network is input into the third-stage network, and the fourth-stage network has no stage network thereafter, so the output of the connected multi-stage effective attention network is not required to be input into other stage networks.

According to the technical scheme provided by the embodiment of the disclosure, a double-flow effective attention network and a multi-stage effective attention network are constructed, a residual error neural network is used as a backbone network, and the double-flow effective attention network and the multi-stage effective attention network are connected to obtain a human body identification model; acquiring a human body training data set, and inputting training samples in the human body training data set into a human body recognition model, wherein the training samples are marked with a first preset number of effective areas and a second preset number of ineffective areas; outputting a plurality of stage characteristics through a plurality of stage networks of a residual error neural network respectively; performing double-flow interactive calculation on the characteristics of each stage by using a double-flow effective attention network to obtain interactive characteristics corresponding to the characteristics of each stage; processing each stage feature and the interactive feature corresponding to the stage feature by using a multi-stage effective attention network to obtain a significant feature corresponding to each stage feature; based on the significant features corresponding to the features of each stage, the loss function is utilized to complete the training of the human body recognition model, so that the technical means can solve the problem that in the prior art, the accuracy rate of the image with the shielding condition is low when the human body recognition model trained based on the traditional model training method is used for recognizing the image with the shielding condition, and further improve the accuracy rate of the image with the shielding condition in the model recognition.

Optionally, the first processing module 303 is further configured to output the first stage feature, the second stage feature and the third stage feature through a second stage network, a third stage network and a fourth stage network of the residual neural network, respectively; the residual error neural network comprises a zeroth-stage network, a first-stage network, a second-stage network, a third-stage network and a fourth-stage network.

Optionally, the second processing module 304 is further configured to, for each stage feature: sequentially carrying out deformable convolution calculation, batch normalization processing and first activation processing on the stage characteristics to obtain first characteristics corresponding to the stage characteristics; sequentially carrying out cavity convolution calculation, batch normalization processing and second activation processing on the stage characteristic to obtain a second characteristic corresponding to the stage characteristic; sequentially carrying out deformable convolution calculation, batch normalization processing and first activation processing on the first features to obtain third features corresponding to the features at the stage; sequentially performing feature stacking processing, cavity convolution calculation, batch normalization processing and second activation processing on the first feature and the second feature to obtain a fourth feature corresponding to the stage feature; sequentially performing feature stacking processing, deformable convolution calculation, batch normalization processing and first activation processing on the third feature and the fourth feature to obtain a fifth feature corresponding to the feature at the stage; carrying out average pooling treatment and full-connection layer treatment on the fifth characteristics in sequence to obtain sixth characteristics corresponding to the characteristics at the stage; sequentially carrying out deformable convolution calculation, batch normalization processing and first activation processing on the fourth features to obtain seventh features corresponding to the features at the stage; and sequentially performing feature stacking processing, convolution calculation and third activation processing on the sixth feature and the seventh feature to obtain the interactive feature corresponding to the stage feature.

The first activation processing may be processing using a relu activation function, the second activation processing may be processing using a hash activation function, and the third activation processing may be processing using a sigmoid activation function. The feature stacking process, also called a feature stitching process, is a process of stitching two or more features together to obtain one feature. The full connection layer processing is through a full connection layer network.

Optionally, the third processing module 305 is further configured to perform feature stacking processing, deformable convolution calculation, batch normalization processing, and first activation processing on each stage feature and the stage feature in sequence to obtain a salient feature corresponding to each stage feature.

Optionally, the training module 306 is further configured to complete training of the human body recognition model to identify the valid region and the invalid region of the picture by using the first loss function and the second loss function based on the significant feature corresponding to each stage feature; based on the significant features corresponding to the features of each stage, a third loss function is utilized to complete the training of the target object in the recognition picture of the human body recognition model; the training of the effective area of the human body recognition model recognition picture and the training of the target object in the human body recognition model recognition picture are carried out simultaneously.

Optionally, the training module 306 is further configured to determine, on the significant feature corresponding to each stage feature, a first partial feature corresponding to the marked effective region on the training sample; determining a second part of features corresponding to the marked invalid region on the training sample on the significant features corresponding to the features of each stage; based on the first partial characteristics corresponding to the characteristics of each stage, training the effective region of the human body recognition model recognition picture by utilizing a first loss function; based on the second partial characteristics corresponding to the characteristics of each stage, a second loss function is utilized to complete the training of the human body recognition model to recognize the invalid region of the picture; the training of the effective area of the picture recognized by the human body recognition model and the training of the ineffective area of the picture recognized by the human body recognition model are carried out simultaneously.

Optionally, the training module 306 is further configured to complete training of the effective region of the human body recognition model recognition picture by using a first loss function based on the probability that the first part of features corresponding to the features of each stage recognized by the human body recognition model are features corresponding to the effective region in the training sample; and based on the probability that the second part of features corresponding to the features of each stage are recognized by the human body recognition model and are the features corresponding to the invalid regions in the training sample, completing the training of recognizing the invalid regions of the pictures by the human body recognition model by utilizing a second loss function.

Determining that the first partial feature and the second partial feature corresponding to each stage feature are completed according to the labeling information of the training sample; and the human body recognition model also judges the probability that the first part of features corresponding to the features of each stage are the features corresponding to the effective region in the training sample. The same is true for the second partial feature.

The first loss function loss1 is as follows:

loss1＝-Σlog(p _i )

loss2＝-∑log(1-p _i )

It should be noted that training for the valid area and the invalid area of the human body recognition model recognition picture and training for the target object in the human body recognition model recognition picture are related. Training the human body recognition model to recognize the target object in the picture actually trains the human body recognition model, and the target object in the picture is recognized based on the effective region and the ineffective region of the picture.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present disclosure.

Fig. 4 is a schematic diagram of an electronic device 4 provided by the embodiment of the present disclosure. As shown in fig. 4, the electronic apparatus 4 of this embodiment includes: a processor 401, a memory 402 and a computer program 403 stored in the memory 402 and executable on the processor 401. The steps in the various method embodiments described above are implemented when the processor 401 executes the computer program 403. Alternatively, the processor 401 implements the functions of the respective modules/units in the above-described respective apparatus embodiments when executing the computer program 403.

The electronic device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other electronic devices. The electronic device 4 may include, but is not limited to, a processor 401 and a memory 402. Those skilled in the art will appreciate that fig. 4 is merely an example of electronic device 4, and does not constitute a limitation of electronic device 4, and may include more or fewer components than shown, or different components.

The Processor 401 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like.

The storage 402 may be an internal storage unit of the electronic device 4, for example, a hard disk or a memory of the electronic device 4. The memory 402 may also be an external storage device of the electronic device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device 4. The memory 402 may also include both internal storage units of the electronic device 4 and external storage devices. The memory 402 is used for storing computer programs and other programs and data required by the electronic device.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method in the above embodiments, and may also be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of the above methods and embodiments. The computer program may comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, software distribution medium, etc. It should be noted that the computer readable medium may contain suitable additions or additions that may be required in accordance with legislative and patent practices within the jurisdiction, for example, in some jurisdictions, computer readable media may not include electrical carrier signals or telecommunications signals in accordance with legislative and patent practices.

The above examples are only intended to illustrate the technical solutions of the present disclosure, not to limit them; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present disclosure, and are intended to be included within the scope of the present disclosure.

Claims

1. A training method of a human body recognition model is characterized by comprising the following steps:

constructing a double-flow effective attention network and a multi-stage effective attention network, connecting the double-flow effective attention network and the multi-stage effective attention network by taking a residual neural network as a backbone network, and obtaining a human body identification model;

acquiring a human body training data set, and inputting training samples in the human body training data set into the human body recognition model, wherein the training samples are marked with a first preset number of effective areas and a second preset number of ineffective areas;

outputting a plurality of stage features through a plurality of stage networks of the residual neural network, respectively;

performing double-flow interactive calculation on the characteristics of each stage by using the double-flow effective attention network to obtain interactive characteristics corresponding to the characteristics of each stage;

processing each stage feature and the interactive feature corresponding to the stage feature by using the multi-stage effective attention network to obtain a significant feature corresponding to each stage feature;

and finishing the training of the human body recognition model by using a loss function based on the significant features corresponding to the features of each stage.

2. The method of claim 1, wherein the outputting a plurality of stage features through a plurality of stage networks of the residual neural network, respectively, comprises:

outputting a first-stage feature, a second-stage feature and a third-stage feature through a second-stage network, a third-stage network and a fourth-stage network of the residual error neural network respectively;

the residual error neural network comprises a zeroth-stage network, a first-stage network, a second-stage network, a third-stage network and a fourth-stage network.

3. The method according to claim 1, wherein the performing double-flow interactive calculation on each stage feature by using the double-flow effective attention network to obtain an interactive feature corresponding to each stage feature comprises:

for each stage feature:

sequentially carrying out deformable convolution calculation, batch normalization processing and first activation processing on the stage characteristics to obtain first characteristics corresponding to the stage characteristics;

sequentially carrying out cavity convolution calculation, the batch normalization processing and the second activation processing on the stage characteristics to obtain second characteristics corresponding to the stage characteristics;

sequentially carrying out the deformable convolution calculation, the batch normalization processing and the first activation processing on the first features to obtain third features corresponding to the stage features;

sequentially performing feature stacking processing, the cavity convolution calculation, the batch normalization processing and the second activation processing on the first feature and the second feature to obtain a fourth feature corresponding to the feature at the stage;

sequentially performing the feature stacking processing, the deformable convolution calculation, the batch normalization processing and the first activation processing on the third feature and the fourth feature to obtain a fifth feature corresponding to the feature at the stage;

carrying out average pooling treatment and full-connection layer treatment on the fifth characteristics in sequence to obtain sixth characteristics corresponding to the characteristics at the stage;

sequentially carrying out the deformable convolution calculation, the batch normalization processing and the first activation processing on the fourth features to obtain seventh features corresponding to the stage features;

and sequentially performing the feature stacking processing, the convolution calculation and the third activation processing on the sixth feature and the seventh feature to obtain the interactive feature corresponding to the feature at the stage.

4. The method according to claim 1, wherein the processing each stage feature and the interactive feature corresponding to the stage feature by using the multi-stage active attention network to obtain the significant feature corresponding to each stage feature comprises:

and sequentially performing feature stacking processing, deformable convolution calculation, batch normalization processing and first activation processing on each stage feature and the stage feature to obtain the significant feature corresponding to each stage feature.

5. The method according to claim 1, wherein the training of the human body recognition model is completed by using a loss function based on the corresponding significant features of each stage feature, and comprises:

based on the significant features corresponding to the features of each stage, finishing the training of the effective region and the ineffective region of the human body recognition model recognition picture by utilizing a first loss function and a second loss function;

based on the significant features corresponding to the features of each stage, finishing training of the target object in the human body recognition model recognition picture by utilizing a third loss function;

and training the effective region and the ineffective region of the human body recognition model recognition picture and training the target object in the human body recognition model recognition picture are simultaneously carried out.

6. The method according to claim 5, wherein the training for the human body recognition model to recognize the valid region and the invalid region of the picture based on the significant features corresponding to each stage feature by using a first loss function and a second loss function comprises:

determining a first part of features corresponding to the marked effective region on the training sample on the significant features corresponding to the features of each stage;

determining a second part of features corresponding to the marked invalid region on the training sample on the significant features corresponding to the features of each stage;

based on the first partial characteristics corresponding to each stage characteristic, the training of the effective region of the human body recognition model recognition picture is completed by utilizing the first loss function;

based on a second part of characteristics corresponding to each stage characteristic, finishing training of the invalid region of the human body recognition model recognition picture by using the second loss function;

wherein the training of the human body recognition model for recognizing the picture effective area and the training of the human body recognition model for recognizing the picture ineffective area are performed simultaneously.

7. The method of claim 6, comprising:

based on the probability that the first part of features corresponding to the features of each stage are recognized by the human body recognition model and are features corresponding to the effective region in the training sample, training of the effective region of the human body recognition model recognition picture is completed by utilizing the first loss function;

and based on the probability that the second part of features corresponding to the features of each stage recognized by the human body recognition model are the features corresponding to the invalid regions in the training sample, completing the training of the invalid regions of the recognition pictures of the human body recognition model by using the second loss function.

8. A training device for human body recognition model is characterized by comprising:

the construction module is configured to construct a double-flow effective attention network and a multi-stage effective attention network, and the double-flow effective attention network and the multi-stage effective attention network are connected by taking a residual error neural network as a backbone network to obtain a human body identification model;

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is configured to acquire a human body training data set and input training samples in the human body training data set into the human body recognition model, and a first preset number of effective areas and a second preset number of ineffective areas are marked on the training samples;

a first processing module configured to output a plurality of stage features through a plurality of stage networks of the residual neural network, respectively;

the second processing module is configured to perform double-flow interactive calculation on each stage feature by using the double-flow effective attention network to obtain an interactive feature corresponding to each stage feature;

a third processing module configured to process each stage feature and the interactive feature corresponding to the stage feature by using the multi-stage active attention network to obtain a significant feature corresponding to each stage feature;

and the training module is configured to complete the training of the human body recognition model by using the loss function based on the significant features corresponding to the features of each stage.

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.