CN115375292A

CN115375292A - Payment intention recognition method, device, electronic equipment, medium and program product

Info

Publication number: CN115375292A
Application number: CN202210960359.8A
Authority: CN
Inventors: 尹英杰; 丁菁汀; 李亮
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-08-11
Filing date: 2022-08-11
Publication date: 2022-11-22

Abstract

The embodiment of the specification discloses a method, a device, electronic equipment, a medium and a program product for identifying willingness-to-pay. Wherein, the method comprises the following steps: the method comprises the steps of obtaining a target face brushing image, wherein the target face brushing image comprises a target face brushing user, generating a corresponding target mask image based on a target position of the target face brushing user in the target face brushing image, the target mask image is used for distinguishing a face area of the target face brushing user from other areas except the face area, inputting the target face brushing image and the target mask image into a willingness-of-payment recognition model to obtain gazing characteristics and limb behavior characteristics, outputting a recognition result corresponding to the target face brushing user based on the gazing characteristics and the limb behavior characteristics, and training the willingness-of-payment recognition model based on the face brushing images with known willingness-of-payment information corresponding to a plurality of face brushing users.

Description

Payment intention recognition method, device, electronic equipment, medium and program product

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, an electronic device, a medium, and a program product for willingness-to-pay identification.

Background

The face-brushing payment is a novel payment mode realized based on technologies such as artificial intelligence, machine vision, 3D sensing and big data, and brings great convenience to users by adopting face recognition as authentication.

At present, in the face payment scene of brushing, wait to pay the user and open the face payment back of brushing, need stand in the place ahead that has face payment function equipment of brushing, carry out face identification. However, in a scene in which the off-line face brushing device is used for face brushing payment, on one hand, there may be a case in which an illegal person uses the face brushing device to embezzle the face of another person without attention; on the other hand, when a plurality of users may stand in front of the face brushing device, that is, a plurality of users appear in the face brushing image collected by the face brushing device, there may be a situation that the user a starts face brushing and mistakenly brushes the user B, thereby easily causing the occurrence of face brushing payment security public sentiment.

Based on this, brush face payment wish recognition is the important link to brushing face safety guarantee in the payment system, helps promoting to brush face safety and experiences, and the condition of brushing and mistake brush all can reduce the security of brushing face payment by the aforesaid robbery, consequently, needs a safer payment wish recognition scheme to brush face payment.

Disclosure of Invention

The embodiment of the specification provides a method, a device, electronic equipment, a medium and a program product for recognizing the willingness-to-pay, wherein the willingness-to-pay of a target face brushing user is recognized together through the respective corresponding relevant characteristics of the gazing and the body behaviors of the target face brushing user in a target face brushing image, so that the recognition accuracy of the willingness-to-pay is improved, the problem of brush stealing or brush mistake existing in face brushing payment is solved, the safety of face brushing payment is guaranteed, and the safety experience of the user on face brushing payment is improved. The technical scheme is as follows:

in a first aspect, an embodiment of the present specification provides a willingness-to-pay identification method, including:

acquiring a target face brushing image; the target face brushing image comprises a target face brushing user;

generating a corresponding target mask image based on a target position of the target face brushing user in the target face brushing image; the target mask map is used for distinguishing the face area of the target face brushing user from other areas except the face area;

inputting the target face brushing image and the target mask image into a willingness-to-pay recognition model to obtain gazing characteristics and body behavior characteristics, and outputting a recognition result corresponding to the target face brushing user based on the gazing characteristics and the body behavior characteristics; the willingness-to-pay recognition model is obtained by training face brushing images of known willingness-to-pay information corresponding to a plurality of face brushing users.

In one possible implementation manner, before the obtaining of the target face brushing image, inputting the target face brushing image and the target mask map into a willingness-to-pay recognition model to obtain a gazing feature and a body behavior feature, and outputting a recognition result corresponding to the target face brushing user based on the gazing feature and the body behavior feature, the method further includes:

determining a target human body area image of the target face brushing user based on the face area of the target face brushing user and the target face brushing image; the target body area image includes a limb of the target face brushing user;

the inputting the target face brushing image and the target mask image into a willingness-to-pay recognition model to obtain a gazing feature and a body behavior feature, and outputting a recognition result corresponding to the target face brushing user based on the gazing feature and the body behavior feature, includes:

inputting the target human body area image and the target mask image into a willingness-to-pay recognition model to obtain a gazing feature and a limb behavior feature, and outputting a recognition result corresponding to the target face brushing user based on the gazing feature and the limb behavior feature.

In one possible implementation manner, the inputting the target human body region image and the target mask map into a willingness-to-pay recognition model to obtain a gazing feature and a body behavior feature, and outputting a recognition result corresponding to the target face-brushing user based on the gazing feature and the body behavior feature includes:

extracting a first feature corresponding to the target human body area image;

fusing the first feature and the target mask map to generate a second feature;

determining the limb behavior characteristics corresponding to the target human body area image based on the first characteristics;

determining a gazing characteristic corresponding to the target human body area image based on the second characteristic;

and determining the corresponding recognition result of the target face brushing user according to the gazing characteristics and the body behavior characteristics.

In a possible implementation manner, the recognition result includes a willingness-to-pay recognition result;

after the target face brushing image and the target mask image are input into a willingness-to-pay recognition model to obtain gazing features and body behavior features, and recognition results corresponding to the target face brushing user are output based on the gazing features and the body behavior features, the method further includes:

and determining whether the target face brushing user has willingness to pay or not based on the identification result.

In a possible implementation manner, the recognition result further includes a gaze recognition result and a payment behavior recognition result;

the determining whether the target user swipes his/her face based on the recognition result includes:

and under the condition that the recognition result of the willingness-to-pay does not meet the preset condition, determining whether the target face brushing user has willingness-to-pay according to the gaze recognition result and/or the payment behavior recognition result.

In one possible implementation, the target brushing image includes a plurality of brushing users, and the plurality of brushing users includes the target brushing user;

after the target face brushing image is acquired, before the target face brushing user generates a corresponding target mask map at a target position in the target face brushing image, the method further includes:

and determining a target face brushing user from the plurality of face brushing users according to a preset rule.

In a possible implementation manner, the recognition result includes a willingness-to-pay recognition result; the willingness-to-pay recognition model comprises a first target convolutional network, a second target convolutional network and a first full connection layer;

the first target convolutional network is configured to process the target human body region image and the target mask map to obtain a gazing feature corresponding to the target human body region image;

the second target convolutional network is configured to process the target human body area image to obtain a limb behavior characteristic corresponding to the target human body area image;

and the first full connection layer is used for fusing the watching characteristics and the limb behavior characteristics and outputting a payment intention recognition result corresponding to the target face brushing user.

In a possible implementation manner, the first target convolutional network includes a first convolutional module, a second convolutional module, and a third convolutional module;

the first convolution module is used for extracting a first feature corresponding to the target human body area image;

the second convolution module is configured to fuse the first feature and the target mask map to generate a second feature;

and the third convolution module is configured to generate a gazing feature corresponding to the target human body region image based on the second feature.

In a possible implementation manner, the second target convolutional network includes a first convolutional module and a fourth convolutional module;

the fourth convolution module is configured to generate a limb behavior feature corresponding to the target human body region image based on the first feature.

In a possible implementation manner, the recognition result further includes a gaze recognition result and a payment behavior recognition result; the willingness-to-pay recognition model further comprises a second full connection layer and a third connection layer;

the second connection layer is configured to output the gaze recognition result corresponding to the target face brushing user based on the gaze feature;

the third connection layer is configured to output the payment behavior recognition result corresponding to the target face brushing user based on the gaze characteristic.

In a possible implementation manner, the first target convolutional network is obtained by training based on human body region images of known willingness-to-pay information and/or gazing information corresponding to the plurality of face-brushing users and a mask map;

and the second target convolutional network is obtained by training based on the human body region images of the known willingness-to-pay information and/or payment behavior information corresponding to the plurality of face brushing users.

In a possible implementation, the target human body region image has the same resolution as the target mask image.

In a second aspect, an embodiment of the present specification provides a willingness-to-pay recognition apparatus, including:

the acquisition module is used for acquiring a target face brushing image; the target face brushing image comprises a target face brushing user;

a generating module, configured to generate a corresponding target mask map based on a target position of the target face brushing user in the target face brushing image; the target mask map is used for distinguishing the face area of the target face brushing user from other areas except the face area;

a willingness-to-pay recognition module, configured to input the target face brushing image and the target mask image into a willingness-to-pay recognition model to obtain a gazing feature and a limb behavior feature, and output a recognition result corresponding to the target face brushing user based on the gazing feature and the limb behavior feature; the willingness-to-pay recognition model is obtained by training face brushing images of known willingness-to-pay information corresponding to a plurality of face brushing users.

In a possible implementation manner, the willingness-to-pay recognition apparatus further includes:

the first determining module is used for determining a target human body area image of the target face brushing user based on the face area of the target face brushing user and the target face brushing image; the target body area image comprises the limb of the target face brushing user;

the willingness-to-pay identification module is specifically configured to: inputting the target human body area image and the target mask image into a willingness-to-pay recognition model to obtain gazing characteristics and body behavior characteristics, and outputting a recognition result corresponding to the target face brushing user based on the gazing characteristics and the body behavior characteristics.

In a possible implementation manner, the willingness-to-pay recognition module includes:

the extraction unit is used for extracting a first feature corresponding to the target human body area image;

a fusion unit configured to fuse the first feature and the target mask map to generate a second feature;

a first determining unit, configured to determine a limb behavior feature corresponding to the target human body region image based on the first feature;

a second determination unit, configured to determine a gaze feature corresponding to the target human body region image based on the second feature;

and a third determining unit, configured to determine, according to the gaze feature and the body behavior feature, a recognition result corresponding to the target face brushing user.

the willingness-to-pay recognition device further comprises:

and the second determination module is used for determining whether the target face brushing user has a willingness to pay or not based on the identification result.

the second determining module is specifically configured to:

and under the condition that the recognition result of the willingness-to-pay does not meet the preset condition, determining whether the target face brushing user has the willingness-to-pay according to the watching recognition result and/or the payment behavior recognition result.

the willingness-to-pay recognition device further comprises:

and the third determining module is used for determining a target face brushing user from the plurality of face brushing users according to a preset rule.

In a possible implementation manner, the recognition result includes a willingness-to-pay recognition result; the willingness-to-pay identification model comprises a first target convolutional network, a second target convolutional network and a first full connection layer;

In a possible implementation manner, the first target convolution network includes a first convolution module, a second convolution module, and a third convolution module;

the first convolution module is configured to extract a first feature corresponding to the target human body region image;

the third connection layer is configured to output the payment behavior recognition result corresponding to the target face brushing user based on the gazing feature.

the second target convolutional network is obtained by training based on the human body region images of the known willingness-to-pay information and/or payment behavior information corresponding to the plurality of face brushing users.

In a possible implementation, the resolution of the target human body region image is the same as the resolution of the target mask image.

In a third aspect, an embodiment of the present specification provides an electronic device, including: a processor and a memory;

the processor is connected with the memory;

the memory is used for storing executable program codes;

the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to perform the method provided by the first aspect of the embodiments of the present specification or any one of the possible implementation manners of the first aspect.

In a fourth aspect, an embodiment of the present specification provides a computer storage medium, where multiple instructions are stored, and the instructions are adapted to be loaded by a processor and execute a method provided by the first aspect of the embodiment or any one of the possible implementation manners of the first aspect.

In a fifth aspect, the present specification provides a computer program product containing instructions, which when run on a computer or a processor, causes the computer or the processor to execute the method for identifying a willingness-to-pay provided in the first aspect of the present specification or any one of the possible implementations of the first aspect.

The embodiment of the specification obtains a target face brushing image, the target face brushing image comprises a target face brushing user, a corresponding target mask map is generated based on a target position of the target face brushing user in the target face brushing image, the target mask map is used for distinguishing a face area of the target face brushing user from other areas except the face area, then the target face brushing image and the target mask map are input into a willingness-to-pay identification model to obtain gazing characteristics and limb behavior characteristics, identification results corresponding to the target face brushing user are output based on the gazing characteristics and the limb behavior characteristics, and the willingness-to-pay identification model is obtained by training face brushing images with known willingness-to-pay information corresponding to a plurality of face brushing users. According to the embodiment of the specification, through an end-to-end learning mode of a willingness-to-pay recognition model, a corresponding recognition result is determined according to the gazing characteristics of a target face brushing user in a target face brushing image, but the recognition result corresponding to the target face brushing user is determined by combining the gazing characteristics and the body behavior characteristics of the target face brushing user, compared with the recognition result corresponding to the gazing characteristics of the target face brushing user in the target face brushing image, the recognition result corresponding to the target face brushing user is determined by combining the gazing characteristics and the body behavior characteristics of the target face brushing user, the accuracy of willingness-to-pay recognition can be improved, and therefore in a face brushing payment scene, the problem that the user steals or mistakenly brushes other non-face brushing users when using face brushing equipment to brush face payment in an offline public occasion can be solved through the identified more accurate willingness-to pay, the safety of face brushing payment is guaranteed, and the safety experience of the user on face brushing payment is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic diagram of an architecture of a willingness-to-pay recognition system according to an exemplary embodiment of the present disclosure;

fig. 2A-2B are schematic diagrams illustrating an application scenario of willingness-to-pay recognition according to an exemplary embodiment of the present disclosure;

fig. 3 is a flowchart illustrating a willingness-to-pay recognition method according to an exemplary embodiment of the present disclosure;

fig. 4 is a flowchart illustrating another willingness-to-pay identification method according to an exemplary embodiment of the present disclosure;

fig. 5 is a schematic diagram illustrating an implementation process of determining a target human body region and a target mask map according to an exemplary embodiment of the present disclosure;

fig. 6 is a schematic flow chart of an implementation of willingness-to-pay recognition according to an exemplary embodiment of the present disclosure;

fig. 7 is a schematic diagram illustrating an implementation process of willingness-to-pay identification according to an exemplary embodiment of the present disclosure;

fig. 8 is a schematic diagram of an implementation process of another willingness-to-pay recognition provided in an exemplary embodiment of the present specification;

FIG. 9 is a schematic diagram illustrating an implementation process of determining whether a target face-brushing user has a willingness-to-pay according to a recognition result according to an exemplary embodiment of the present specification;

fig. 10 is a schematic structural diagram of a willingness-to-pay recognition model according to an exemplary embodiment of the present specification;

fig. 11 is a schematic structural diagram of a willingness-to-pay recognition device according to an exemplary embodiment of the present disclosure;

fig. 12 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure.

The terms "first," "second," "third," and the like in the description and in the claims, as well as in the drawings described above, are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an architecture of a willingness-to-pay recognition system according to an exemplary embodiment of the present disclosure. As shown in fig. 1, the willingness-to-pay recognition system may include: a brushing device 110 and a server 120. Wherein:

the face brushing device 110 may be a device such as a mobile phone, a tablet computer, or a notebook computer equipped with a user version software and a camera, or may also be another device such as an Internet of Things (IOT) face brushing device equipped with a camera and having a payment function, which is not limited in this description embodiment.

Optionally, when the face brushing device 110 acquires the target face brushing image, in order to avoid a situation that the face of the target face brushing user is swiped illegally or mistakenly for payment, a corresponding target mask map may be generated based on a target position of the target face brushing user in the target face brushing image, where the target mask map is used to distinguish a face area of the target face brushing user from other areas except the face area, then the target face brushing image and the target mask map are input into a willingness-to-pay recognition model to obtain gazing characteristics and body behavior characteristics, and a recognition result corresponding to the target face brushing user is output based on the gazing characteristics and the body behavior characteristics, the willingness-to-pay recognition model is obtained by training based on face brushing images of known willingness-to-pay information corresponding to a plurality of face brushing users, and finally whether the target face brushing user has a willingness to pay is known from the recognition result corresponding to the target face brushing user, so as to determine whether the target face brushing user is to pay, and prevent the target face brushing user from being swiped illegally or brushed by other people.

As can be appreciated, after the brushing device 110 collects the target brushing images, the mode of separately executing the willingness-to-pay recognition based on the target brushing images directly through the brushing device 110 does not need to additionally transmit the target brushing images, that is, the brushing device 110 can avoid the limitation of the network when separately executing the willingness-to-pay recognition, thereby ensuring the efficiency and feasibility of the willingness-to-pay recognition.

Optionally, when the brushing device 110 collects the target brushing image, in order to avoid a situation that the face of another user is swiped illegally or mistakenly for payment in the current payment, a data connection relationship may be established between the server 120 and the network, for example, the target brushing image is sent to the server 120, then the identification result of the target brushing user in the target brushing image determined by the server 120 based on the target brushing image is received, and finally whether the target brushing user has a willingness to pay is known according to the identification result corresponding to the target brushing user, so as to determine whether the target brushing user pays the payment, and prevent the face of the target brushing user from being swiped illegally or mistakenly swiped by another person.

As can be appreciated, the server 120 is a high-performance computer, which has a strong data processing capability and high stability and reliability, and therefore, after the face brushing device 110 collects the target face brushing image, compared with a method of directly performing the recognition of the willingness-to-pay based on the target face brushing image by the face brushing device 110, the face brushing device 110 sends the collected target face brushing image to the server 120 through the network for recognition of the willingness-to-pay, that is, a method of performing the recognition of the willingness-to-pay jointly by the face brushing device 110 and the server 120 can avoid the problems of inaccurate recognition of the willingness-to-pay or slow speed and the like caused by the low configuration of the face brushing device 110, and can also ensure the stability and accuracy of the recognition of the willingness-to-pay to a certain extent, thereby providing stronger security guarantee for face brushing payment.

Alternatively, when it is known from the recognition result corresponding to the target face brushing user that the target face brushing user has a willingness to pay, the face brushing device 110 may determine that the target face brushing user pays, and after the face brushing payment is completed, send corresponding payment information and the like to a terminal corresponding to the target face brushing user through a network.

The server 120 may be a server capable of providing multiple kinds of willingness-to-pay identification, and may receive data such as a target face brushing image sent by the face brushing device 110 through a network, where the target face brushing image includes a target face brushing user, and generates a corresponding target mask map based on a target position of the target face brushing user in the target face brushing image, the target mask map is used to distinguish a face area of the target face brushing user from other areas except the face area, the target face brushing image and the target mask map are input into a willingness-to-pay identification model to obtain gazing characteristics and body behavior characteristics, and output an identification result corresponding to the target face brushing user based on the gazing characteristics and the body behavior characteristics, and the willingness-to-pay identification model is obtained by training based on a face brushing image of known willingness-to-pay information corresponding to each of multiple face brushing users.

Specifically, after the recognition of the willingness-to-pay of the target face brushing user in the target face brushing image is completed, the server 120 may further send a recognition result corresponding to the target face brushing user to the face brushing device 110 through a network, so that the face brushing device 110 may determine whether the target face brushing user pays the payment according to the recognition result corresponding to the target face brushing user. When the server 120 determines that the target face brushing user has a willingness to pay, it may also send corresponding payment information or payment prompt information to a terminal corresponding to the target face brushing user, so as to prompt the target face brushing user that face brushing payment has been performed or confirm whether face brushing payment is being performed, or the like.

Specifically, the server 120 may be, but is not limited to, a hardware server, a virtual server, a cloud server, and the like.

The network may be a medium that provides a communication link between the server 120 and any one of the facer apparatuses 110, or may be the internet that includes network devices and transmission media, without limitation. The transmission medium may be a wired link (such as, but not limited to, coaxial cable, fiber optic cable, and Digital Subscriber Line (DSL), etc.) or a wireless link (such as, but not limited to, wireless fidelity (WIFI), bluetooth, and mobile device network, etc.).

It is to be appreciated that the number of the brushing devices 110 and the servers 120 in the willingness-to-pay recognition system shown in fig. 1 is merely an example, and that any number of brushing devices and servers may be included in the willingness-to-pay recognition system in particular implementations. The examples in this specification are not particularly limited thereto. For example, but not limiting of, the brushing device 110 may be a brushing device cluster composed of a plurality of brushing devices, and the server 120 may be a server cluster composed of a plurality of servers.

Referring to fig. 2A-2B, fig. 2A-2B are schematic diagrams illustrating an application scenario of a method for willingness-to-pay identification according to an exemplary embodiment of the present disclosure, where the face brushing device 110 in fig. 1 may be the self-service purchase machine 210 in fig. 2A and 2B. After the user 220 selects the goods to be purchased on the self-service purchase machine 210, the user can click on the "face-brushing payment" control 212 displayed on the screen of the self-service purchase machine 210 in fig. 2A to perform face-brushing payment. After the "face-brushing payment" control 212 is triggered, as shown in fig. 2B, the self-service purchase machine 210 starts to collect a face-brushing image 230 through a camera 211 installed on the self-service purchase machine, and performs face recognition and willingness-to-pay recognition based on the collected face-brushing image 230, so as to determine the identity of the user who brushes the face for payment, whether the user has willingness-to-pay, and the like. However, a plurality of users may often stand in front of the self-service bill buying machine 210, that is, when a plurality of users appear in the face brushing image collected by the self-service bill buying machine 210, there may be a case that the user 220 starts face brushing payment and brushes the face of another user around the user 220 by mistake for payment, or a case that the user 220 steals the face of the user behind the user without attention of the user behind the user, so that the occurrence of public opinion on the safety of face brushing payment is easily caused, and the safety of face brushing payment cannot be powerfully guaranteed.

In order to solve the above problem, a willingness-to-pay recognition method provided by an embodiment of the present specification is described next with reference to fig. 1 to 2B. Please specifically refer to fig. 3, which is a flowchart illustrating a willingness-to-pay recognition method according to an exemplary embodiment of the present disclosure. As shown in fig. 3, the willingness-to-pay recognition method includes the following steps:

and S302, acquiring a target face brushing image.

Specifically, when the user triggers the face brushing device 110 to perform face brushing payment, a target face brushing image can be collected through a camera installed on the face brushing device 110, and the target face brushing image comprises a target face brushing user. At this time, if the server 120 performs the willingness-to-pay recognition, the brushing device 110 needs to transmit the collected target brushing image to the server 120 through the network. The server 120 may acquire the target brushing image sent by the brushing device 110 through the network, and thus perform willingness-to-pay recognition based on the target brushing image. If the facial brushing device 110 executes the willingness-to-pay recognition, after the facial brushing device 110 acquires the target facial brushing image through the installed camera, the willingness-to-pay recognition based on the target facial brushing image may be started directly, and all of the following embodiments are described by taking the implementation of the willingness-to-pay recognition by the facial brushing device 110 as an example.

It will be appreciated that when the camera of the brushing device 110 is positioned in front of multiple users, the target brushing image may also include multiple users, and the target brushing user of the multiple users may be understood as the user in the target brushing image who is most likely to perform face recognition, such as but not limited to the user closest to the brushing device 110 or having the largest area or the most centered position in the target brushing image.

It can be understood that, a target face brushing user in the target face brushing image may be a user who triggers the face brushing device 110 or actually needs to perform face brushing payment, or may be another user who stands in front of the camera of the face brushing device 110 and does not have a willingness to pay, and in order to prevent the situation that the target face brushing user has no willingness to pay but is mistakenly brushed or swiped illegally due to the fact that the target face brushing user appears in front of the camera of the face brushing device 110, when performing face brushing payment, it is necessary to determine whether the target face brushing user has a willingness to pay first, and then determine whether to brush the face of the target face brushing user to pay.

And S304, generating a corresponding target mask image based on the target position of the target face brushing user in the target face brushing image.

Specifically, after the target face brushing image is collected, a face detection algorithm may be used to detect a region where the face of the target face brushing user is located in the target face brushing image, determine a target position of the target face brushing user, and then generate a target mask image corresponding to the target face brushing user according to the target position, so as to distinguish the face region of the target face brushing user from other regions except the face region, and make the facial feature information of the target face brushing user more vivid.

For example, after the target brushing face image is acquired and the target position of the target brushing face user is determined, the filling value of the face area of the target brushing face user in the target brushing face image may be 1, and the filling values of other areas may be 0 (different filling values with higher discrimination may also be adopted), so as to generate the target mask map corresponding to the area capable of discriminating the face area of the target brushing face user from the areas other than the face area.

It can be understood that the face area of the target face brushing user represented by the target position may be a regular area such as a rectangle, a circle, or any irregular area, which is not limited in this specification.

S306, inputting the target face brushing image and the target mask image into the willingness-to-pay recognition model to obtain the gazing characteristic and the limb behavior characteristic, and outputting a recognition result corresponding to the target face brushing user based on the gazing characteristic and the limb behavior characteristic.

Specifically, the willingness-to-pay recognition model is obtained by training based on the face brushing images of known willingness-to-pay information corresponding to a plurality of face brushing users. The willingness-to-pay information comprises a willingness-to-pay label corresponding to the face brushing user who has willingness-to-pay and a non-willingness-to-pay label corresponding to the face brushing user who does not have willingness-to-pay.

Specifically, in order to enable the willingness-to-pay recognition model to obtain the gazing feature of the target brushing-face image more accurately through the target mask image, the resolutions and sizes of the target brushing-face image and the target mask image input into the willingness-to-pay recognition model should be consistent, so that the willingness-to-pay recognition model can obtain the corresponding gazing feature from the face area of the target brushing-face user in the target brushing-face image more conveniently.

Specifically, the recognition result corresponding to the target face brushing user may include a willingness-to-pay recognition result corresponding to the target face brushing user. The willingness-to-pay recognition result comprises the probability that the target face brushing user identified by the willingness-to-pay recognition model has willingness-to-pay and/or does not have willingness-to-pay.

Specifically, in an offline face brushing payment scene, when a user wants to perform face brushing payment, not only a camera or a screen of the face brushing device 110 is watched, but also an action of triggering face brushing payment by clicking the face brushing device 110 with a hand, namely a payment action, is present, and therefore, the condition that the user can not well prevent embezzlement or mistaken brushing by only using a face area of the target face brushing user, namely whether the target face brushing user watches the camera or the screen of the face brushing device 110 to recognize the willingness of payment is not provided.

In an actual offline brushing face payment scene, a target brushing face image acquired by the brushing face device 110 may include multiple brushing face users, and the target brushing face image includes not only a face area of the target brushing face user, but also a limb area of the target brushing face user, such as an upper limb or a lower limb, and may also include other objects that do not need to be recognized, such as a clout, a table, a chair, and the like in an environment where the target brushing face user is located, and these additional objects may have a certain influence when recognizing the willingness of payment of the target brushing face user in the target brushing face image, at this time, in order to more accurately realize recognition of willingness of payment and better achieve an effect of preventing fraudulent brushing or mistaken brushing, further improve the safety of offline brushing face payment, and then, with reference to fig. 4, another willingness of payment recognition method provided by the embodiment of the present specification is introduced. As shown in fig. 4, the willingness-to-pay recognition method includes the following steps:

s402, acquiring a target face brushing image.

Specifically, the target brushing face image may include a plurality of brushing face users and/or objects that do not need to be recognized, other than the target brushing face user, such as a cloudiness, a table and a chair in an environment where the target brushing face user is located, which is not limited by the embodiment of the present specification.

Optionally, when the target face brushing image includes a plurality of face brushing users, after the target face brushing image is acquired, the target face brushing user may be further determined from the plurality of face brushing users according to a preset rule. The preset rule may be that a face brushing user closest to the face brushing device or having the largest occupied area or the most centered position in the target face brushing image is selected from a plurality of face brushing users as the target face brushing user, or the target face brushing user may be determined in other manners, which is not limited in the embodiment of the present specification.

S404, generating a corresponding target mask image based on the target position of the target face brushing user in the target face brushing image.

Specifically, after the target face brushing image is acquired, a face detection algorithm may be used to detect a region where the face of the target face brushing user is located in the target face brushing image, so as to determine a target position where the face region of the target face brushing user is located, and then a mask map corresponding to the target face brushing user is generated according to the target position, so that the face region of the target face brushing user and other regions except the face region in the target face brushing image can be distinguished. Because the target face-brushing image may include a plurality of face-brushing users and/or objects which do not need to be identified except the target face-brushing user, when the willingness to pay of the target face-brushing user is identified, other areas except the area where the target face-brushing user is located in the target face-brushing image may not need to be identified, that is, the target human body area corresponding to the target face-brushing user in the mask image may be determined through the target position where the face area of the target face-brushing user is located in the mask image corresponding to the target face-brushing user, and then cut, so as to obtain the target mask image.

It can be understood that when the offline Internet Of Things (Internet Of Things, IOT) face brushing machine (face brushing device) with the face brushing function set in public consumption scenes such as business supermarkets, convenience stores, restaurants, wine hotels, campus education, medical treatment and the like is used for face brushing payment, the collected target face brushing image not only includes the face Of the face brushing user, but also includes limb parts Of the face brushing user, that is, the target human body region can include the region where the limb Of the target face brushing user is located in addition to the face region Of the target face brushing user.

For example, as shown in fig. 5 (left diagram), when the face area of the target face brushing user in the mask map corresponding to the target face brushing user is an inscribed circle area of a matrix frame composed of a point a as an upper left vertex and a point B as a lower right vertex, if the coordinates of the point a are (x 1, y 1), the coordinates of the point B are (x 2, y 2), and the radius of the face area is R, the position of the target human body area may be calculated from the position coordinates of the point a and the point B. As shown in fig. 5 (left drawing), the target body region is a matrix region composed of a point C as an upper left vertex and a point D as a lower right vertex, where the point C coordinates are (x 3, y 3), the point D coordinates are (x 4, y 4), x3= x1-2r, x4= x2+2r, y3= y1-R, and y4= y2+4R, that is, a rectangular frame corresponding to the face region of the target user is expanded by a certain length in four directions, i.e., up, down, left, and right, respectively, so as to obtain the target body region, and then the target mask map shown in fig. 5 (right drawing) is obtained by performing cropping according to the position in the target body region re-mask map.

It should be understood that the method for determining the target human body area of the target face brushing user in the mask image of the target face brushing user corresponding to the target face brushing image is not limited to the method shown in fig. 5, and may be determined directly by other methods such as key point detection, which is not limited in the embodiment of the present specification.

It can be understood that the size of the target mask image in S304 is consistent with the size of the target face brushing image, and the size of the target mask image in S404 is consistent with the size of the target human body region corresponding to the target face brushing user in the target face brushing image, so that when a plurality of face brushing users or other objects which do not need to be identified appear in the target face brushing image, the willingness-of-payment identification model can be more concentrated on analyzing the features corresponding to the target face brushing user, interference of other information is avoided, the willingness-of-payment identification accuracy and efficiency are further improved, and the safety of offline face brushing payment is ensured.

S406, determining a target human body area image of the target face brushing user based on the face area of the target face brushing user and the target face brushing image.

Specifically, in order to avoid the influence of other information in the target face brushing image except the target face brushing user on the recognition of the intention of payment of the target face brushing user, the efficiency and the accuracy of the recognition of the intention of payment are improved, after the target face brushing image is collected and the target face brushing user to be subjected to the recognition of the intention of payment is determined, the target body area of the target face brushing user can be obtained by expanding the face area of the target face brushing user in the target face brushing image to the periphery according to a certain rule, finally, the target body area part in the target face brushing image is cut off, certain telescopic transformation is performed, and the like, so that the target body area image which is consistent with the resolution of the target mask image or the target face brushing image is obtained. The target human body region image includes not only the face of the target face brushing user but also the limbs of the target face brushing user.

S408, inputting the target human body area image and the target mask image into the willingness-to-pay recognition model to obtain the gazing feature and the limb behavior feature, and outputting a recognition result corresponding to the target face brushing user based on the gazing feature and the limb behavior feature.

Specifically, the willingness-to-pay recognition model is obtained by training a human body region image corresponding to a face brushing image of known willingness-to-pay information of each of a plurality of face brushing users.

It can be understood that, in order to better fuse the features in the target mask image and the target body region image, and to obtain more precise and more focused features on the face of the target face brushing user, so as to determine the final gazing features, the resolution and size of the target body region image and the target mask image should be the same.

Optionally, as shown in fig. 6, the implementation process of outputting the recognition result corresponding to the target face-brushing user by the willingness-to-pay recognition model in S408 may include the following steps:

s602, extracting a first feature corresponding to the target human body area image.

Specifically, the target body area image includes the face and the limbs of the target face brushing user, the limbs include at least an upper limb, and the upper limb may include a left hand and/or a right hand, which is not limited in this specification. The first feature corresponds to a basic feature corresponding to a target face brushing user in the target human body region image.

S604, fusing the first feature and the target mask image to generate a second feature.

Specifically, in order to directly analyze (identify) whether a target face brushing user watches a screen or a camera of a face brushing device on the basis of a target human body region image, a first target mask image with the same resolution as a first feature can be generated in a nearest neighbor sampling mode, then the first feature and the first target mask image are connected together according to a channel dimension, and finally a convolution network module is fused to fuse the first target mask image and a first feature corresponding to the target human body region image to obtain a second feature which is more important on the face of the target face brushing user.

It is to be understood that the above-mentioned converged convolutional network module may be composed of one or more convolutional layers, which is not limited in this specification.

And S606, determining the limb behavior characteristics corresponding to the target human body area image based on the first characteristics.

Specifically, the limb behavior characteristics include characteristics corresponding to the behavior of the upper limb of the target face brushing user in the target human body region image.

It should be understood that the behavior of the upper limb may be a state in which the upper limb (hand) is lifted up to present a click action when the target face brushing user clicks the face brushing device to trigger face brushing payment, or another state in which the target face brushing user does not click the face brushing device at all, that is, the upper limb of the target face brushing user is not lifted up or is not a click action, and the like, and the embodiment of the present specification does not limit this.

And S608, determining the gazing characteristic corresponding to the target human body area image based on the second characteristic.

Specifically, the dimension of the second feature obtained by fusing the first feature and the target mask map is the same as the dimension of the first feature, so that the fixation feature corresponding to the target human body area image can be determined from the second feature of the face area of the user who pays more attention to the target face brushing through one or more convolution modules of the same willingness-to-pay recognition model.

S610, determining the corresponding recognition result of the target face brushing user according to the gazing feature and the limb behavior feature.

Specifically, after the gazing feature and the limb behavior feature of the target face brushing user in the target human body area image are determined, the gazing feature and the limb behavior feature can be connected in series, so that a fusion feature is generated, then the recognition of the payment intention of the target face brushing user is realized through the full connection layer, and the recognition result corresponding to the target face brushing user is obtained.

Optionally, in order to improve the efficiency of the willingness-to-pay identification, according to the willingness-to-pay identification flow shown in fig. 4, that is, the willingness-to-pay identification process shown in fig. 7, the gazing feature corresponding to the target face brushing user is determined directly on the basis of the target human body region image corresponding to the target face brushing user in the target face brushing image, the limb behavior feature corresponding to the target face brushing user is determined on the basis of the target human body region image corresponding to the target face brushing user in the target face brushing image, and then the identification result is determined according to the gazing feature and the limb behavior feature, so that while the accuracy of the willingness-to-pay identification is improved, the calculation amount in the identification process can be reduced, and the efficiency of the willingness-to-pay identification is improved.

Optionally, in addition to the willingness-to-pay identification process shown in fig. 7, the willingness-to-pay identification process shown in fig. 8 may be further performed, that is, a target face image and a target body area image of a target face brushing user are directly captured from the acquired target face brushing image, then a gazing feature and a limb behavior feature are respectively obtained based on the target face image and the target body area image, and finally an identification result is determined according to the gazing feature and the limb behavior feature. In the willingness-to-pay recognition process shown in fig. 8, because the target face image and the target body region image are both independent images, corresponding gaze features can be extracted from the target face image directly through the pre-trained convolution module, and no additional target mask image is required to be fused with the features of the target body region image, i.e., the willingness-to-pay recognition process shown in fig. 8 is simpler than the willingness-to-pay recognition process shown in fig. 7, and the implementation difficulty is lower.

Further, the process of respectively obtaining the gazing feature and the body behavior feature based on the target face image and the target body area image may be obtained by correspondingly and respectively recognizing by using a gazing recognition model and a payment behavior recognition model.

It can be understood that the target face image and the target human body region image captured based on the target face brushing image can also be directly and respectively input into the trained gazing identification model and payment behavior identification model, so that a gazing identification result and a payment behavior identification result are output, and finally whether the target face brushing user has a willingness to pay or not is jointly judged according to the gazing identification result and the payment behavior identification result.

It can be understood that whether the target face brushing user watches the screen or the camera of the face brushing device or not needs to be known from the face area of the target face brushing user in the target face brushing image, the target face image can be directly replaced by the target eye image, and then the watching direction or the watching state of the target face brushing user is analyzed according to the target eye image, so that the watching identification of the target face brushing user in the target face brushing image is realized.

Optionally, when a plurality of face brushing users exist in the target face brushing image, in order to ensure validity of face brushing payment as much as possible, in place of performing willingness-to-pay identification only on a target face brushing user among the plurality of face brushing users, willingness-to-pay identification may be performed on each of the plurality of face brushing users or a face brushing user satisfying a preset identification condition, and then a face brushing user with the highest willingness-to-pay is selected from the plurality of face brushing users for face identification and payment. The preset recognition condition may be, but is not limited to, a face brushing user who has a corresponding complete face in the target face brushing image, or a face brushing user who is within a preset distance from the face brushing device, or a face brushing user who occupies a larger area than a preset area in the target face brushing image.

Optionally, the recognition result includes a willingness-to-pay recognition result. After S306 or S408, that is, after the recognition result corresponding to the target face-brushing user is output, it may be further determined whether the target face-brushing user has a willingness to pay based on the recognition result. The above-described willingness-to-pay recognition result includes a probability of having a willingness-to-pay and a probability of not having a willingness-to-pay (non-willingness-to-pay).

Alternatively, determining whether the target face brushing user has a willingness-to-pay based on the recognition result may be determining whether the target face brushing user has a willingness-to-pay according to the willingness-to-pay recognition result in a case where the willingness-to-pay recognition result satisfies a preset condition. For example, but not limited to, when the probability of having a willingness-to-pay is greater than a first preset threshold and/or the probability corresponding to the non-willingness-to-pay is less than a second preset threshold, determining that the target user brushing the face has a willingness-to-pay; and when the probability of having the willingness-to-pay is smaller than a first preset threshold value and/or the probability corresponding to the non-willingness-to-pay is larger than a second preset threshold value, determining that the target face brushing user does not have the willingness-to-pay, and the like. The first preset threshold may be 0.8, 0.9, etc., and the second preset threshold may be 0.1, 0.2, etc., which are smaller than the first preset threshold, and this is not limited in the embodiments of the present specification.

Optionally, the recognition result may further include a fixation recognition result and a payment behavior recognition result, where the fixation recognition result includes a probability that the target face brushing user is fixed on a screen or a camera of the face brushing device and/or a probability that the target face brushing user is not fixed (not fixed) on the screen or the camera of the face brushing device, and the payment behavior recognition result includes a probability that the target face brushing user has a payment behavior and/or a probability that the target face brushing user does not have a payment behavior (not payment behavior). The determining whether the target face-brushing user has the willingness-to-pay based on the recognition result may be further based on determining whether the target face-brushing user has the willingness-to-pay based on the gaze recognition result and/or the payment behavior recognition result, in a case that the willingness-to-pay recognition result does not satisfy a preset condition. For example, but not limited to, when the probability of having a willingness-to-pay in the willingness-to-pay recognition result is not within a preset range, if the gazing recognition result meets a preset gazing condition and/or the payment behavior recognition result meets a preset payment behavior condition, determining that the target face-brushing user has a willingness-to-pay; and when the probability of having the willingness of payment in the willingness of payment recognition result is not within the preset range, if the gazing recognition result does not meet the preset gazing condition and the payment behavior recognition result does not meet the preset payment behavior condition, determining that the target face brushing user does not have the willingness of payment. The preset range may be less than 0.1 and greater than 0.8, less than 0.2 and greater than 0.9, the preset watching condition may be that a probability that a target face brushing user watches a screen or a camera of the face brushing device is greater than 0.99, 0.95, and the like, and the preset payment behavior condition may be that a probability that the target face brushing user has a payment behavior is greater than 0.99, 0.95, and the like, which is not limited in the embodiments of the present specification.

For example, when the recognition result includes a willingness-to-pay recognition result, a payment behavior recognition result, and a gaze recognition result, it may be determined whether the target face-brushing user has a willingness-to-pay according to the process shown in fig. 9. At this time, the preset condition in fig. 9 may be that the probability that the target face brushing user has the will of payment in the result of the recognition of the will of payment is greater than 0.8, 0.9, etc., and the preset requirement may be that the probability that the target face brushing user watches the screen or the camera of the face brushing device in the result of the recognition of the target face brushing user is greater than 0.99, 0.95, etc., or the probability that the target face brushing user has the act of payment in the result of the recognition of the act of payment is greater than 0.99, 0.95, etc., which is not limited in the embodiment of the present specification.

The embodiment of the specification can be used for comprehensively judging whether a target face brushing user has a willingness to pay by combining a watching recognition result and/or a payment behavior recognition result when the willingness to pay recognition result does not meet a preset condition, namely, the willingness to pay in the willingness to pay recognition result is fuzzy or not, and whether the target face brushing user has a willingness to pay or not is difficult to accurately judge through the willingness to pay recognition result, for example, the method is not limited to the case that the willingness to pay of the target face brushing user is 0.5 is shown in the willingness to pay recognition result, if the probability that the target face brushing user has a payment behavior is shown in the payment behavior recognition result is 1 at the moment, the specific strong willingness to pay of the target face brushing user is shown, namely, the target face brushing user can be directly determined to have a willingness to pay and the like, so that willingness to pay recognition can be more accurately realized and the safety of face brushing payment can be further improved.

Next, a willingness-to-pay recognition model provided in an embodiment of the present specification is described with reference to fig. 10. As shown in fig. 10, when the recognition result in S408 includes a willingness-to-pay recognition result, the willingness-to-pay recognition model thereof includes a first target convolutional network, a second target convolutional network, and a first fully-connected layer. Wherein:

and the first target convolution network is used for processing the target human body area image and the target mask image to obtain the corresponding gazing characteristic of the target human body area image.

Specifically, the first target convolutional network comprises a first convolutional module, a second convolutional module and a third convolutional module. Wherein: and the first convolution module is used for extracting a first feature corresponding to the target human body area image. And the second convolution module is used for fusing the first characteristic and the target mask image to generate a second characteristic. And the third convolution module is used for generating a gazing characteristic corresponding to the target human body area image based on the second characteristic.

Specifically, before the first feature and the target mask image are fused, the second convolution module needs to adjust the resolution of the target mask image to be the same as that of the first feature, so as to obtain the first target mask image, and then the first feature and the first target mask image are spliced according to the channel dimension. Because the channel dimension corresponding to the first target mask image is increased after splicing, in order to ensure that the subsequent process can be normally performed, the spliced first feature and the first target mask image need to be fused through the second convolution module, so that a second feature with the same dimension as the first feature is generated.

Optionally, the first convolution module and the third convolution module may be obtained by splitting from the same convolution network.

And the second target convolution network is used for processing the target human body area image to obtain the limb behavior characteristics corresponding to the target human body area image.

In particular, the second target convolutional network includes a first convolutional module and a fourth convolutional module. Wherein: and the first convolution module is used for extracting a first feature corresponding to the target human body area image. And the fourth convolution module is used for generating the limb behavior characteristics corresponding to the target human body area image based on the first characteristics.

Optionally, the first convolution module and the fourth convolution module may be obtained by splitting from the same convolution network. The third convolution module and the fourth convolution module may have the same corresponding structures, but different parameters.

And the first full connection layer is used for fusing the gazing characteristics and the limb behavior characteristics and outputting a payment intention identification result corresponding to the target face brushing user.

Alternatively, the recognition result in S408 may include a fixation recognition result and a payment behavior recognition result in addition to the willingness-to-pay recognition result. At this time, as shown in fig. 10, the willingness-to-pay recognition model further includes a second full connection layer and a third connection layer. And the second connecting layer is used for outputting a gaze identification result corresponding to the target face brushing user based on the gaze characteristics. And the third connecting layer is used for outputting the payment behavior recognition result corresponding to the target face brushing user based on the gazing characteristic.

Optionally, the first target convolutional network shown in fig. 10 may be obtained by training based on human body region images of known willingness-to-pay information and/or gaze information corresponding to a plurality of face-brushing users, and a mask map; the second target convolutional network is obtained by training the human body region images based on the known payment willingness information and/or payment behavior information corresponding to the plurality of face brushing users. The payment intention information comprises a corresponding payment intention label when the face brushing user has a payment intention and a corresponding fee payment intention label when the face brushing user does not have the payment intention in the face brushing image; the watching information comprises a watching tag corresponding to the condition that a user brushing face is positioned on a screen or a camera of the face brushing equipment in the face brushing image and a non-watching tag corresponding to the condition that the user brushing face is not positioned on the screen or the camera of the face brushing equipment; the payment behavior information comprises a payment behavior label corresponding to the face brushing user who has the payment behavior of clicking with limbs and the like in the face brushing image and a non-payment behavior label corresponding to the face brushing user who does not have the payment behavior of clicking with limbs and the like.

It can be understood that the willingness-to-pay recognition model shown in fig. 10 may be obtained by performing an overall training directly on the basis of the face brushing images of the known willingness-to-pay information corresponding to the plurality of face brushing users, or may be obtained by training a first target convolution network on the basis of the human body area images of the known willingness-to-pay information and/or the gaze information corresponding to the plurality of face brushing users and the mask map, and then training a second target convolution network on the basis of the human body area images of the known willingness-to-pay information and/or the gaze information corresponding to the plurality of face brushing users, or training a second target convolution network, and then training the first target convolution network on the basis of the trained second target convolution network, so as to obtain a trained willingness-to-pay recognition model.

Referring to fig. 11, fig. 11 is a diagram illustrating a device for recognizing willingness-to-pay according to an exemplary embodiment of the present disclosure. The willingness-to-pay recognition device 1100 includes:

an obtaining module 1110, configured to obtain a target brushing image; the target face brushing image comprises a target face brushing user;

a generating module 1120, configured to generate a corresponding target mask map based on a target position of the target face brushing user in the target face brushing image; the target mask map is used for distinguishing the face area of the target face brushing user from other areas except the face area;

a willingness-to-pay recognition module 1130, configured to input the target face brushing image and the target mask image into a willingness-to-pay recognition model, obtain a gazing feature and a limb behavior feature, and output a recognition result corresponding to the target face brushing user based on the gazing feature and the limb behavior feature; the willingness-to-pay recognition model is obtained by training face brushing images of known willingness-to-pay information corresponding to a plurality of face brushing users.

In a possible implementation manner, the device 1120 for recognizing a willingness-to-pay further includes:

the willingness-to-pay recognition module 1130 is specifically configured to: inputting the target human body area image and the target mask image into a willingness-to-pay recognition model to obtain a gazing feature and a limb behavior feature, and outputting a recognition result corresponding to the target face brushing user based on the gazing feature and the limb behavior feature.

In one possible implementation manner, the willingness-to-pay recognition module 1130 includes:

an extraction unit, configured to extract a first feature corresponding to the target human body region image;

a fusion unit, configured to fuse the first feature and the target mask map to generate a second feature;

a first determining unit, configured to determine a limb behavior feature corresponding to the target human body area image based on the first feature;

a second determination unit configured to determine a gaze feature corresponding to the target human body region image based on the second feature;

and a third determining unit, configured to determine, according to the gazing feature and the body behavior feature, an identification result corresponding to the target face brushing user.

the willingness-to-pay recognition device 1120 further comprises:

the second determining module is specifically configured to:

the willingness-to-pay recognition device 1120 further comprises:

In a possible implementation manner, the second target convolution network includes a first convolution module and a fourth convolution module;

The division of the modules in the device for recognizing willingness-to-pay is only used for illustration, and in other embodiments, the device for recognizing willingness-to-pay may be divided into different modules as needed to complete all or part of the functions of the device for recognizing willingness-to-pay. The implementation of each module in the willingness-to-pay recognition apparatus provided in the embodiments of the present specification may be in the form of a computer program. The computer program may be run on a terminal or a server. The program modules constituted by the computer program may be stored on the memory of the terminal or the server. The computer program, when executed by a processor, implements all or part of the steps of the willingness-to-pay recognition method described in the embodiments of the present specification.

Referring to fig. 12, fig. 12 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure. As shown in fig. 12, the electronic device 1200 may include: at least one processor 1210, at least one communication bus 1220, a user interface 1230, at least one network interface 1240, and a memory 1250. The communication bus 1220 may be used for implementing connection communication of the above components.

User interface 1230 may include a Display screen (Display) and a Camera (Camera), and optional user interfaces may also include standard wired interfaces and wireless interfaces.

The network interface 1240 may optionally include a bluetooth module, a Near Field Communication (NFC) module, a Wireless Fidelity (Wi-Fi) module, and the like.

Processor 1210 may include one or more processing cores, among other things. The processor 1210, using various interfaces and connections throughout the electronic device 1200, performs various functions and processes data for the electronic device 1200 by executing or performing instructions, programs, code sets, or instruction sets stored in the memory 1250, and invoking data stored in the memory 1250. Optionally, the processor 1210 may be implemented in at least one hardware form of Digital Signal Processing (DSP), field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1210 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is understood that the above modem may not be integrated into the processor 1210, but may be implemented by one chip.

The Memory 1250 may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). Optionally, the memory 1250 includes a non-transitory computer readable medium. The memory 1250 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 1250 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as an acquisition function, a generation function, a willingness-to-pay recognition function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. Memory 1250 can also optionally be at least one memory device located remotely from the aforementioned processor 1210. As shown in fig. 12, the memory 1250, which is a type of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and program instructions.

In particular, processor 1210 may be used to call program instructions stored in memory 1250 and perform the following operations in particular:

acquiring a target face brushing image; the target brushing image comprises a target brushing user.

Generating a corresponding target mask image based on the target position of the target face brushing user in the target face brushing image; the target mask map is used to distinguish the face area of the target face-brushing user from other areas other than the face area.

In some possible embodiments, after the processor 1210 obtains a target face brushing image, before the target face brushing image and the target mask map are input into a willingness-to-pay recognition model to obtain a gazing feature and a body behavior feature, and outputting a recognition result corresponding to the target face brushing user based on the gazing feature and the body behavior feature, the processor 1210 is further configured to:

determining a target human body area image of the target face brushing user based on the face area of the target face brushing user and the target face brushing image; the target body area image includes a limb of the target face brushing user.

The processor 1210 inputs the target face brushing image and the target mask image into a willingness-to-pay recognition model to obtain a gazing feature and a body behavior feature, and is specifically configured to execute:

inputting the target human body area image and the target mask image into a willingness-to-pay recognition model to obtain gazing characteristics and body behavior characteristics, and outputting a recognition result corresponding to the target face brushing user based on the gazing characteristics and the body behavior characteristics.

In some possible embodiments, the processor 1210 inputs the target human body region image and the target mask map into a willingness-to-pay recognition model to obtain a gazing feature and a body behavior feature, and outputs a recognition result corresponding to the target face-brushing user based on the gazing feature and the body behavior feature, where the recognition result is specifically configured to perform:

and extracting a first feature corresponding to the target human body area image.

And fusing the first feature and the target mask image to generate a second feature.

And determining the limb behavior characteristics corresponding to the target human body area image based on the first characteristics.

And determining the gazing characteristic corresponding to the target human body area image based on the second characteristic.

And determining the corresponding recognition result of the target face brushing user according to the gazing characteristic and the limb behavior characteristic.

In some possible embodiments, the recognition result includes a willingness-to-pay recognition result; the processor 1210 inputs the target face brushing image and the target mask image into a willingness-to-pay recognition model to obtain a gazing feature and a body behavior feature, and outputs a recognition result corresponding to the target face brushing user based on the gazing feature and the body behavior feature, and then is further configured to perform: and determining whether the target face brushing user has willingness to pay or not based on the identification result.

In some possible embodiments, the recognition result further includes a gaze recognition result and a payment behavior recognition result;

the processor 1210, when determining whether the target user brushing the face has a willingness to pay based on the recognition result, is specifically configured to: and under the condition that the recognition result of the willingness-to-pay does not meet the preset condition, determining whether the target face brushing user has willingness-to-pay according to the gaze recognition result and/or the payment behavior recognition result.

In some possible embodiments, the target brushing image includes a plurality of brushing users, the plurality of brushing users including the target brushing user; after the processor 1210 obtains the target brushing image, before the corresponding target mask map is generated based on the target brushing user at the target position in the target brushing image, the processor 1210 is further configured to:

In some possible embodiments, the recognition result includes a willingness-to-pay recognition result; the willingness-to-pay recognition model comprises a first target convolutional network, a second target convolutional network and a first full connection layer;

the first target convolutional network is configured to process the target human body region image and the target mask map to obtain a gazing feature corresponding to the target human body region image; the second target convolutional network is configured to process the target human body area image to obtain a limb behavior characteristic corresponding to the target human body area image; and the first full connection layer is used for fusing the watching characteristics and the limb behavior characteristics and outputting a payment intention recognition result corresponding to the target face brushing user.

In some possible embodiments, the first target convolutional network includes a first convolutional module, a second convolutional module, and a third convolutional module;

and the third convolution module is configured to generate a gazing feature corresponding to the target human body area image based on the second feature.

In some possible embodiments, the second target convolutional network includes a first convolutional module and a fourth convolutional module;

the first convolution module is configured to extract a first feature corresponding to the target human body region image; the fourth convolution module is configured to generate a limb behavior feature corresponding to the target human body region image based on the first feature.

In some possible embodiments, the recognition result further includes a gaze recognition result and a payment behavior recognition result; the willingness-to-pay recognition model further comprises a second full connection layer and a third connection layer; the second connection layer is configured to output the gaze recognition result corresponding to the target face brushing user based on the gaze feature; the third connection layer is configured to output the payment behavior recognition result corresponding to the target face brushing user based on the gaze characteristic.

In some possible embodiments, the target body region image has the same resolution as the target mask image.

The present specification also provides a computer readable storage medium having stored therein instructions, which when executed on a computer or processor, cause the computer or processor to perform one or more of the steps of the above embodiments. If the various constituent modules of the willingness-to-pay recognition apparatus are implemented in the form of software functional units and sold or used as independent products, they may be stored in the computer-readable storage medium.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The processes or functions described above in accordance with the embodiments of this specification are all or partially performed when the computer program instructions described above are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., a flexible Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a Digital Versatile Disk (DVD)), a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. And the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks. The technical features in the present examples and embodiments may be arbitrarily combined without conflict.

The above-described embodiments are merely preferred embodiments of the present disclosure, and are not intended to limit the scope of the present disclosure, and various modifications and improvements made to the technical solutions of the present disclosure by those skilled in the art without departing from the design spirit of the present disclosure should fall within the protection scope defined by the claims.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims and in the specification may be performed in an order different than in the embodiments recited in the specification and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Claims

1. A willingness-to-pay recognition method, the method comprising:

inputting the target face brushing image and the target mask image into a willingness-to-pay recognition model to obtain gazing characteristics and limb behavior characteristics, and outputting a recognition result corresponding to the target face brushing user based on the gazing characteristics and the limb behavior characteristics; the willingness-to-pay recognition model is obtained by training face brushing images of known willingness-to-pay information corresponding to a plurality of face brushing users.

2. The method of claim 1, after the target face-brushing image is obtained, before the target face-brushing image and the target mask image are input into a willingness-to-pay recognition model to obtain a gazing feature and a body behavior feature, and a recognition result corresponding to the target face-brushing user is output based on the gazing feature and the body behavior feature, the method further comprises:

determining a target human body area image of the target face brushing user based on the face area of the target face brushing user and the target face brushing image; the target body area image comprises a limb of the target face brushing user;

the target face brushing image and the target mask image are input into a willingness-to-pay recognition model to obtain a gazing characteristic and a limb behavior characteristic, and a recognition result corresponding to the target face brushing user is output based on the gazing characteristic and the limb behavior characteristic, and the method comprises the following steps of:

and inputting the target human body area image and the target mask image into a willingness-to-pay recognition model to obtain a gazing characteristic and a limb behavior characteristic, and outputting a recognition result corresponding to the target face brushing user based on the gazing characteristic and the limb behavior characteristic.

3. The method of claim 2, wherein the inputting the target body region image and the target mask map into a willingness-to-pay recognition model to obtain a gazing feature and a body behavior feature, and outputting a recognition result corresponding to the target face-brushing user based on the gazing feature and the body behavior feature comprises:

extracting a first feature corresponding to the target human body area image;

fusing the first feature with the target mask image to generate a second feature;

determining a limb behavior characteristic corresponding to the target human body area image based on the first characteristic;

4. The method of claim 1 or 2, the recognition result comprising a willingness-to-pay recognition result;

inputting the target face brushing image and the target mask image into a willingness-to-pay recognition model to obtain a gazing characteristic and a limb behavior characteristic, and outputting a recognition result corresponding to the target face brushing user based on the gazing characteristic and the limb behavior characteristic, wherein the method further comprises the following steps:

determining whether the target face brushing user has a willingness-to-pay based on the recognition result.

5. The method of claim 4, the recognition results further comprising gaze recognition results and payment behavior recognition results;

the determining whether the target face brushing user has a willingness-to-pay based on the recognition result includes:

6. The method of claim 1, the target brushing image comprising a plurality of brushing users including the target brushing user;

after the target face brushing image is acquired, before the corresponding target mask image is generated based on the target position of the target face brushing user in the target face brushing image, the method further includes:

7. The method of claim 2, the recognition result comprising a willingness-to-pay recognition result; the willingness-to-pay recognition model comprises a first target convolutional network, a second target convolutional network and a first full connection layer;

the first target convolutional network is used for processing the target human body area image and the target mask image to obtain a gazing characteristic corresponding to the target human body area image;

the second target convolutional network is used for processing the target human body area image to obtain the limb behavior characteristics corresponding to the target human body area image;

and the first full connection layer is used for fusing the gazing characteristics and the limb behavior characteristics and outputting a payment intention recognition result corresponding to the target face brushing user.

8. The method of claim 7, the first target convolutional network comprising a first convolutional module, a second convolutional module, a third convolutional module;

and the third convolution module is used for generating a gazing characteristic corresponding to the target human body area image based on the second characteristic.

9. The method of claim 7, the second target convolutional network comprising a first convolutional module and a fourth convolutional module;

the fourth convolution module is configured to generate a limb behavior feature corresponding to the target human body area image based on the first feature.

10. The method of claim 7, the recognition results further comprising gaze recognition results and payment behavior recognition results; the willingness-to-pay recognition model further comprises a second full connection layer and a third connection layer;

the second connection layer is used for outputting the gaze identification result corresponding to the target face brushing user based on the gaze characteristics;

and the third connecting layer is used for outputting the payment behavior recognition result corresponding to the target face brushing user based on the gazing characteristic.

11. The method of claim 7, wherein the first target convolutional network is trained based on human body region images of known willingness-to-pay information and/or gaze information and masking maps corresponding to the plurality of face-brushing users;

and the second target convolutional network is obtained by training based on the human body region images of the known payment intention information and/or payment behavior information corresponding to the plurality of face brushing users.

12. The method of any of claims 2-3 or 7-11, wherein the target body region image has the same resolution as the target mask map.

13. A willingness-to-pay recognition device, the device comprising:

the generating module is used for generating a corresponding target mask image based on the target position of the target face brushing user in the target face brushing image; the target mask map is used for distinguishing the face area of the target face brushing user from other areas except the face area;

the payment intention recognition module is used for inputting the target face brushing image and the target mask image into a payment intention recognition model to obtain a gazing characteristic and a limb behavior characteristic, and outputting a recognition result corresponding to the target face brushing user based on the gazing characteristic and the limb behavior characteristic; the willingness-to-pay recognition model is obtained by training face brushing images of known willingness-to-pay information corresponding to a plurality of face brushing users.

14. An electronic device, comprising: a processor and a memory;

the processor is connected with the memory;

the memory for storing executable program code;

the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory for performing the method of any one of claims 1-12.

15. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps according to any of claims 1-12.

16. A computer program product containing instructions which, when run on a computer or processor, cause the computer or processor to carry out a willingness-to-pay identification method according to any one of claims 1-12.