CN115222427A - Artificial intelligence-based fraud risk identification method and related equipment - Google Patents
Artificial intelligence-based fraud risk identification method and related equipment Download PDFInfo
- Publication number
- CN115222427A CN115222427A CN202210902732.4A CN202210902732A CN115222427A CN 115222427 A CN115222427 A CN 115222427A CN 202210902732 A CN202210902732 A CN 202210902732A CN 115222427 A CN115222427 A CN 115222427A
- Authority
- CN
- China
- Prior art keywords
- fixation point
- point
- image
- fraud risk
- real
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 65
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 37
- 238000001514 detection method Methods 0.000 claims abstract description 107
- 239000013598 vector Substances 0.000 claims abstract description 105
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 36
- 238000004458 analytical method Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 99
- 238000012549 training Methods 0.000 claims description 44
- 230000002159 abnormal effect Effects 0.000 claims description 22
- 230000008569 process Effects 0.000 claims description 21
- 238000012545 processing Methods 0.000 claims description 15
- 238000007621 cluster analysis Methods 0.000 claims description 7
- 238000002347 injection Methods 0.000 claims description 6
- 239000007924 injection Substances 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 18
- 238000012360 testing method Methods 0.000 description 13
- 238000004590 computer program Methods 0.000 description 8
- 210000002569 neuron Anatomy 0.000 description 6
- 230000004927 fusion Effects 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- 239000000243 solution Substances 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
- G06Q30/0185—Product, service or business identity fraud
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/193—Preprocessing; Feature extraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/197—Matching; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Business, Economics & Management (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Ophthalmology & Optometry (AREA)
- Accounting & Taxation (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Entrepreneurship & Innovation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Technology Law (AREA)
- Image Analysis (AREA)
Abstract
The application provides a fraud risk identification method and device based on artificial intelligence, electronic equipment and a storage medium, wherein the fraud risk identification method based on artificial intelligence comprises the following steps: collecting a real-time answer video of a target person; acquiring a fixation point detection network, and acquiring fixation point coordinates of each frame of face image in a real-time answer video based on the fixation point detection network; constructing a fixation point characteristic vector of the real-time answer video based on the fixation point coordinates; collecting the fixation point characteristic vectors of all answer videos in the historical data to construct a fixation point characteristic set, and carrying out clustering analysis on the fixation point characteristic set based on a clustering algorithm to obtain a clustering center; and comparing the distances from the fixation point characteristic vectors of the real-time answer video to different clustering centers to obtain the fraud risk identification result of the target person. According to the method and the device, the fraud risk identification result can be accurately and quickly acquired based on the gaze point characteristic vector of the real-time answer video, and the accuracy and the timeliness of fraud risk identification are improved.
Description
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a fraud risk identification method and apparatus based on artificial intelligence, an electronic device, and a storage medium.
Background
Fraud risk identification is a direction which exists in the wind control field for a long time and has great difficulty, and in the field of financial scenes, whether a user has fraud risk needs to be concerned highly so as to guarantee the safety of capital and property, such as a large-amount transfer scene, a loan application scene and the like.
At present, the traditional fraud risk identification generally starts from the correlation analysis of characters, browsing records, equipment IP (Internet protocol) or answers to answer test questions and the like, however, the method has a certain time delay, and the characters, the browsing records and the like are easily tampered, so that the fraud risk identification has high error rate and low timeliness.
Disclosure of Invention
In view of the above, there is a need to provide a fraud risk identification method based on artificial intelligence and related apparatus, so as to solve the technical problem of how to improve the accuracy and timeliness of fraud risk identification, wherein the related apparatus includes a fraud risk identification device based on artificial intelligence, an electronic apparatus and a storage medium.
The application provides a fraud risk identification method based on artificial intelligence, which comprises the following steps:
acquiring a real-time answer video of a target person, wherein the real-time answer video comprises face images of continuous multiple frames in the answer process of the target person;
acquiring a fixation point detection network, and acquiring fixation point coordinates of each frame of face image in the real-time answer video based on the fixation point detection network;
constructing a fixation point characteristic vector of the real-time answer video based on the fixation point coordinates of the face image;
collecting the fixation point characteristic vectors of all answer videos in historical data to construct a fixation point characteristic set, and carrying out cluster analysis on the fixation point characteristic set based on a clustering algorithm to obtain a clustering center, wherein the clustering center reflects the average characteristic of each category;
and comparing the distances between the fixation point characteristic vectors of the real-time answer video and different clustering centers to obtain the fraud risk identification result of the target person.
In some embodiments, the obtaining a gazing point detection network and obtaining a gazing point coordinate of each frame of face image in the real-time answer video based on the gazing point detection network includes:
constructing a point of regard detection initial network, wherein the point of regard detection initial network comprises a first encoder, a second encoder and a full connection layer;
training the fixation point detection initial network according to a preset loss function to obtain a fixation point detection network, wherein the input of the fixation point detection network is a human face image, and the output of the fixation point detection network is the fixation point coordinate of the human face image;
and sequentially inputting all the face images in the real-time answer video into the fixation point detection network to obtain the fixation point coordinates of each frame of face image.
In some embodiments, the training the initial gaze point detection network according to the preset loss function to obtain the gaze point detection network includes:
a1, collecting a large number of face images in historical data as training data, and acquiring label data of each face image in the training data, wherein the label data comprises an eye area label image, a face area label image and a fixation point coordinate label;
a2, obtaining a first saliency map based on a pixel value in a first feature map, and constructing a first preset loss function based on the first saliency map and the eye region label map, where the first feature map is an output result of the first encoder, and the first preset loss function satisfies a relation:
wherein, I (i,j) Detecting the pixel value of a pixel point (i, j) in an eye area label image of an input face image of the initial network for the fixation point;the pixel value of the pixel point (i, j) in the first saliency map has a value range of [0, 1%](ii) a W multiplied by H is the width and height of the input face image; loss1 is the value of a first preset Loss function; f (X) is a custom function, and the custom function satisfies the relation:
a3, acquiring a second saliency map based on pixel values in a second feature map, and constructing a second preset loss function based on the second saliency map and the face region label map, where the second feature map is an output result of the second encoder, and the second preset loss function satisfies a relation:
wherein,detecting the pixel value of a pixel point (i, j) in a face region label image of an input face image of an initial network for the fixation point;the pixel value of the pixel point (i, j) in the second significance map has a value range of [0, 1%](ii) a W multiplied by H is the width and height size of the input face image, loss2 is the numerical value of a second preset Loss function, f (X) is a self-defined function, and the self-defined function meets the relation:
a4, constructing a third preset loss function based on the output result of the gaze point detection initial network and the gaze point coordinate label, wherein the third preset loss function satisfies the relation:
Loss3=(x * -x) 2 +(y * -y) 2
wherein x is * ,y * A fixation point coordinate label of an input face image of the fixation point detection initial network is used, x and y are output results of the fixation point detection initial network, and Loss3 is a numerical value of a third preset Loss function;
and A5, adding the first preset loss function, the second preset loss function and the third preset loss function to serve as the preset loss function, and training the initial network for detecting the fixation point based on the preset loss function and the training data to obtain the network for detecting the fixation point.
In some embodiments, the acquiring a large number of face images in the history data as training data, and acquiring label data of each face image in the training data, where the label data includes an eye region label map, a face region label map, and a gaze point coordinate label, includes:
setting the pixel value of the eye region in the face image to be 1, and setting the pixel values of other regions to be 0, so as to obtain an eye region label image of each face image in the training data;
setting the pixel value of a face area in the face image to be 1, and setting the pixel values of other areas to be 0, so as to obtain a face area label image of each face image in the training data;
and acquiring the fixation point coordinate of each face image in the training data as the fixation point coordinate label of the face image.
In some embodiments, said obtaining a first saliency map based on pixel values in a first feature map comprises:
calculating the mean value of pixel values in each image channel in the first feature map;
normalizing all the mean values to obtain a weight factor of each image channel, wherein the weight factors satisfy the following relation:
wherein,is an image channel c 1 The average value of the pixel values of (a),is the pixel value mean of all image channels, the denominator is the sum of the pixel value mean of all image channels,is an image channel c 1 The weighting factor of (1);
performing weighted summation on all image channels based on the weighting factors to obtain an initial significant image, wherein the size of the initial significant image is equal to that of each image channel;
and carrying out interpolation processing on the initial salient image based on an interpolation algorithm to obtain a first salient image, wherein the size of the first salient image is equal to that of the input face image of the initial gaze point detection network.
In some embodiments, the clustering the set of gaze point features based on a clustering algorithm to obtain a cluster center, the cluster center reflecting an average feature of each category, includes:
dividing the gaze point feature vectors in the gaze point feature set into two clusters based on a clustering algorithm, and calculating the clustering centers of the two clusters, wherein the clusters comprise all gaze point feature vectors belonging to the same class;
and counting the number of the injection point feature vectors in each cluster, taking the cluster center of the cluster with the smaller number as an abnormal cluster center, and taking the cluster center of the cluster with the larger number as a normal cluster center.
In some embodiments, the comparing distances between the gaze point feature vectors of the real-time answer video and different cluster centers to obtain the fraud risk identification result of the target person includes:
calculating Euclidean distance between the gaze point characteristic vector of the real-time answer video and the normal clustering center to serve as a first distance;
calculating Euclidean distance between the fixation point characteristic vector of the real-time answer video and the abnormal clustering center as a second distance;
comparing the first distance with the second distance to obtain a fraud risk identification result of the target person, wherein if the second distance is greater than the first distance, the fraud risk identification result of the target person is that no fraud risk exists; and if the second distance is not greater than the first distance, the fraud risk identification result of the target person is that fraud risk exists.
The embodiment of the present application further provides a fraud risk identification apparatus based on artificial intelligence, the apparatus includes:
the system comprises a collecting unit, a processing unit and a processing unit, wherein the collecting unit is used for collecting a real-time answer video of a target person, and the real-time answer video comprises continuous multiframe face images in the answer process of the target person;
the fixation point detection unit is used for acquiring a fixation point detection network and obtaining the fixation point coordinates of each frame of face image in the real-time answer video based on the fixation point detection network;
the construction unit is used for constructing the fixation point characteristic vector of the real-time answer video based on the fixation point coordinates of the face image;
the system comprises a clustering unit, a processing unit and a processing unit, wherein the clustering unit is used for collecting the fixation point characteristic vectors of all answer videos in historical data to construct a fixation point characteristic set, and carrying out clustering analysis on the fixation point characteristic set based on a clustering algorithm to obtain a clustering center, and the clustering center reflects the average characteristic of each category;
and the acquisition unit is used for comparing the distances between the fixation point characteristic vectors of the real-time answer videos and different clustering centers to acquire the fraud risk identification result of the target person.
An embodiment of the present application further provides an electronic device, where the electronic device includes:
a memory storing at least one instruction;
and the processor executes the instructions stored in the memory to realize the artificial intelligence based fraud risk identification method.
The embodiment of the application also provides a computer-readable storage medium, and at least one instruction is stored in the computer-readable storage medium and executed by a processor in the electronic device to implement the artificial intelligence based fraud risk identification method.
In conclusion, the method and the device can collect the real-time answer video of the person in the answer process, obtain the point-of-regard characteristic vector of the real-time answer video based on the point-of-regard detection network, perform unsupervised cluster analysis on the point-of-regard characteristic vector and historical data, accurately and quickly obtain the fraud risk identification result of the person, and improve the accuracy and timeliness of fraud risk identification.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of an artificial intelligence based fraud risk identification method to which the present application relates.
Fig. 2 is a schematic structural diagram of an initial network for gaze point detection according to the present application.
FIG. 3 is a functional block diagram of a preferred embodiment of an artificial intelligence based fraud risk identification apparatus according to the present application.
Fig. 4 is a schematic structural diagram of an electronic device according to a preferred embodiment of the artificial intelligence based fraud risk identification method according to the present application.
Detailed Description
For a clearer understanding of the objects, features and advantages of the present application, reference will now be made in detail to the present application with reference to the accompanying drawings and specific examples. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, and the described embodiments are merely some, but not all embodiments of the present application.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The embodiment of the Application provides a fraud risk identification method based on artificial intelligence, which can be applied to one or more electronic devices, wherein the electronic devices are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and hardware of the electronic devices includes but is not limited to a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The electronic device may be any electronic product capable of performing human-computer interaction with a client, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an Internet Protocol Television (IPTV), an intelligent wearable device, and the like.
The electronic device may also include a network device and/or a client device. The network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network servers.
The Network where the electronic device is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.
Fig. 1 is a flow chart of a preferred embodiment of the fraud risk identification method based on artificial intelligence according to the present application. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.
And S10, acquiring a real-time answer video of the target person, wherein the real-time answer video comprises continuous multi-frame face images in the answer process of the target person.
In an optional embodiment, in the process of personal loan or large amount transfer and the like, the target person firstly submits a related application, then a bank preliminarily checks related information of the target person, and after the preliminary check, the seat and the target person perform an online or offline signing conversation link. In this link, the target person needs to answer the test questions in the electronic questionnaire to test whether the target person has a fraud risk. It should be noted that the application scenario described in this alternative embodiment is only an example, and is not limited thereto.
In this optional embodiment, when the target person answers the test questions in the electronic questionnaire, the target person clicks a "start answering" button on the terminal device, at this time, the image acquisition device in the terminal device is triggered, the image acquisition device starts to acquire continuous multiframe human face images of the target person at a fixed frame rate, the answering time of each test question is fixed, when answering of all the questions is completed, the image acquisition device in the terminal device stops the acquisition of the human face images, and all the human face images acquired in the answering process of the target person are used as the real-time answering video of the target person. The terminal device may be a mobile phone, a tablet computer, or the like, and the application is not limited.
In this optional embodiment, since the frame rate of the image acquisition device and the response time of the test question are both fixed and unchanged, and the test questions in the electronic questionnaire are the same in the same application scenario, the number of frames of the face images included in all the acquired answer videos is the same.
Therefore, the real-time answer video of the target person can be collected, the real-time answer video comprises the face images of continuous multiple frames of the target person in the answer process, and a data basis is provided for subsequent fraud risk identification.
S11, a fixation point detection network is obtained, and fixation point coordinates of each frame of face image in the real-time answer video are obtained based on the fixation point detection network.
In an optional embodiment, the obtaining a gazing point detection network and obtaining a gazing point coordinate of each frame of face image in the real-time answer video based on the gazing point detection network includes:
constructing a point of regard detection initial network, wherein the point of regard detection initial network comprises a first encoder, a second encoder and a full connection layer;
training the gaze point detection initial network according to a preset loss function to obtain a gaze point detection network, wherein the input of the gaze point detection network is a face image, and the output of the gaze point detection network is a gaze point coordinate of the face image;
and sequentially inputting all the face images in the real-time answer video into the fixation point detection network to obtain the fixation point coordinates of each frame of face image.
In the optional embodiment, a point of regard detection initial network is established, and the input of the point of regard detection initial network is an answer videoThe expected output of the face image in (1) is the fixation point coordinates of the face image, and the fixation point coordinates comprise coordinate positions in two dimensions of a horizontal direction x and a vertical direction y. The position of a gaze point in a face image is not only related to the detail features of an eye region, but also affects the position of the gaze point due to the global features of the face region, so the gaze point detection initial network comprises a first encoder, a second encoder and a full link layer, the first encoder is used for extracting the detail features of the eye region, and the second encoder is used for extracting the global features of the face region; the first encoder extracts the characteristics of an eye region in an input face image to obtain a first characteristic diagram, wherein the size of the first characteristic diagram is w 1 ×h 1 ×c 1 Wherein w is 1 ×h 1 Is the width and height dimension of the first feature map, c 1 Representing the number of image channels of the first feature map, the size of the first feature map being related to the structure of the first encoder; the second encoder extracts the features of the face region in the input face image to obtain a second feature map, wherein the size of the second feature map is w 2 ×h 2 ×c 2 Wherein w is w ×h 2 Is the width and height dimension of the second feature map, c 2 Representing the number of image channels of the second feature map, the size of the second feature map being related to the structure of the second encoder; all numerical values in the first characteristic diagram and the second characteristic diagram are spliced into a one-dimensional vector, the one-dimensional vector is used as a fusion vector to be input into a full connection layer, the output of the full connection layer is the coordinates of a fixation point in an input human face image, and a schematic structural diagram of the fixation point detection initial network is shown in fig. 2. The fully-connected layer comprises an input layer, a hidden layer and an output layer, the number of neurons in the input layer and the number of all numerical values in the fusion vector are 2, the number of the neurons in the output layer and the number of the neurons in each layer are not limited in the application; the first encoder and the second encoder may adopt the existing encoder structures such as ResNet, denseNet, and the like, and the present application is not limited thereto.
In this optional embodiment, the gaze point detection initial network is trained according to a preset loss function, parameters in the gaze point detection initial network are continuously updated, the gaze point detection initial network is constrained to output coordinates of a gaze point in a face image, and the gaze point detection network is obtained. The training process of the initial network for detecting the gaze point is as follows:
a1, collecting a large number of face images in historical data as training data, and obtaining label data of each face image in the training data, wherein the label data comprises an eye area label image, a face area label image and a fixation point coordinate label.
In this optional embodiment, the pixel value of the eye region in the face image is set to 1, and the pixel values of other regions are set to 0, so as to obtain an eye region label map of each face image in the training data; setting the pixel value of a face area in the face image to be 1, and setting the pixel values of other areas to be 0, so as to obtain a face area label image of each face image in the training data; and acquiring the fixation point coordinate of each face image in the training data as a fixation point coordinate label of the face image, wherein the fixation point coordinate is acquired in a manual labeling mode.
A2, acquiring a first saliency map based on pixel values in the first feature map, wherein the pixel points in the first saliency map correspond to the pixel points in the input face image one by one, and the pixel values of the pixel points in the first saliency map represent the saliency of the features of the pixel points in the input face image in the first feature map; in order to enable a first encoder to extract detail features of an eye region in an input face image, a first preset loss function is constructed based on the first saliency map and the eye region label map, the first preset loss function can constrain pixel points with pixel values larger than 0.5 in the first saliency map to be located in the eye region, and the first preset loss function satisfies the relation:
wherein, I (i,j) Detecting the pixel value of a pixel point (i, j) in an eye region label image of an input face image of an initial network for the fixation point;the pixel value of the pixel point (i, j) in the first significance map has a value range of [0, 1%](ii) a W multiplied by H is the width and height size of the input face image; loss1 is the value of a first preset Loss function; f (X) is a custom function, and the custom function satisfies the relation:
in this alternative embodiment, the obtaining the first saliency map based on the pixel values in the first feature map includes:
calculating the mean value of pixel values in each image channel in the first feature map; the first characteristic diagram has a size w 1 ×h 1 ×c 1 In total contain c 1 An image channel, each image channel being w 1 ×h 1 C is obtained in total 1 An average value;
normalizing all the mean values to obtain a weight factor of each image channel, and obtaining an image channel c 1 Is an example of a weighting factor, the weighting factorThe calculation formula of (2) is as follows:
wherein,is an image channel c 1 The average value of the pixel values of (a),for all image channelsThe value mean value, the denominator is the sum of the pixel value mean values of all image channels,is an image channel c 1 The weighting factor of (1);
performing weighted summation on all image channels based on the weight factors to obtain an initial significant image, wherein the size of the initial significant image is w 1 ×h 1 Equal to the size of each image channel;
and performing interpolation processing on the initial salient image based on an interpolation algorithm to obtain a first salient image, wherein the first salient image is equal to the input face image of the initial gaze point detection network in size, and the interpolation algorithm can adopt a linear interpolation algorithm.
A3, according to the same method, acquiring a second saliency map based on pixel values in the second feature map, wherein the pixel points in the second saliency map correspond to the pixel points in the input face image one by one, and the acquisition method of the second saliency map is the same as that of the first saliency map; in order to enable the second encoder to extract the global features of the face region in the input face image, a second preset loss function is constructed based on the second saliency map and the face region label map, the second preset loss function may constrain that pixel points with pixel values greater than 0.5 in the second saliency map are all located in the face region, and the second preset loss function satisfies the relation:
wherein,detecting the pixel value of a pixel point (i, j) in a face region label image of an input face image of the initial network for the fixation point;in the second significance diagramThe pixel value of the pixel point (i, j) has a value range of [0, 1' ]](ii) a W multiplied by H is the width and height size of the input face image, loss2 is the numerical value of a second preset Loss function, f (X) is a self-defined function, and the self-defined function meets the relation:
a4, constructing a third preset loss function based on the output result of the gaze point detection initial network and the gaze point coordinate label, wherein the third preset loss function can constrain the output of the gaze point detection initial network as the coordinates of the gaze point in the face image, and the third preset loss function satisfies the relation:
Loss3=(x * -x) 2 +(y * -y) 2
wherein x is * ,y * A fixation point coordinate label of an input face image of the fixation point detection initial network is used, and x and y are output results of the fixation point detection initial network; loss3 is the value of the third predetermined Loss function.
And A5, adding the first preset loss function, the second preset loss function and the third preset loss function to form the preset loss function, training the gaze point detection initial network based on the preset loss function and the training data, continuously inputting the face image in the training data into the gaze point detection initial network to obtain the value of the preset loss function, updating the network parameters by using a gradient descent method after obtaining the value of the preset loss function in each iteration, calculating the reduction value of the preset loss function in two adjacent iterations, and stopping training to obtain the gaze point detection network when the reduction value is smaller than a preset threshold value. Wherein the preset threshold is 0.001.
In this optional embodiment, all face images in the real-time answer video are sequentially input to the gaze point detection network to obtain the gaze point coordinates of each frame of face image.
Therefore, the fixation point coordinate of each frame of face image in the real-time answer video of the target person is obtained by means of the fixation point detection network, and in the fixation point detection network, the detail characteristics of the eye area in the face image and the global characteristics of the face area are fused to obtain an accurate fixation point detection result; meanwhile, the gazing point characteristics are difficult to cover, and the identification accuracy can be improved by acquiring the fraud risk identification result based on the gazing point characteristics.
And S12, constructing the fixation point characteristic vector of the real-time answer video based on the fixation point coordinates of the face image.
In an optional embodiment, a timestamp of each frame of face image in a real-time answer video of a target person is obtained, wherein the timestamp reflects the acquisition time of the face image; and arranging the fixation point coordinates of each frame of face image according to the sequence of the timestamps to obtain the fixation point characteristic vector of the real-time answer video.
Exemplarily, assuming that the real-time answer video of the target person includes 3 frames of face images, the gaze point coordinates of the 3 frames of face images are obtained according to the sequence of the timestamps, and are sequentially (x) 1 ,y 1 )、(x 2 ,y 2 )、(x 3 ,y 3 ) If the point-of-regard feature vector of the real-time answer video is (x) 1 ,y 1 ,x 2 ,y 2 ,x 3 ,y 3 )。
Therefore, the point of regard characteristic vector in the real-time answer video can be obtained, and the point of regard characteristic vector provides a data basis for subsequent fraud risk identification.
S13, collecting the fixation point characteristic vectors of all answer videos in the historical data to construct a fixation point characteristic set, and carrying out clustering analysis on the fixation point characteristic set based on a clustering algorithm to obtain a clustering center, wherein the clustering center reflects the average characteristic of each category.
In an optional embodiment, all answer videos in the historical data are collected, and the point-of-regard feature vector of each answer video is obtained according to the same method to construct a point-of-regard feature set; because the number of frames of the face images contained in all the answer videos is the same, the number of numerical values contained in the gazing point characteristic vectors of different answer videos is the same.
In this optional embodiment, the performing, based on a clustering algorithm, a clustering analysis on the gaze point feature set to obtain a clustering center, where the clustering center reflects an average feature of each category, includes:
dividing the gaze point feature vectors in the gaze point feature set into two clusters based on a clustering algorithm, and calculating the clustering centers of the two clusters, wherein the clusters comprise all gaze point feature vectors belonging to the same class;
and counting the number of the injection point feature vectors in each cluster, taking the cluster center of the cluster with the smaller number as an abnormal cluster center, and taking the cluster center of the cluster with the larger number as a normal cluster center.
In this optional embodiment, all the gaze point feature vectors in the gaze point feature set may be divided into two categories, i.e., a normal feature vector without a fraud risk and an abnormal feature vector with a fraud risk, and the clustering algorithm may adopt a distance-based Kmeans algorithm, so that the gaze point feature vectors in the gaze point feature set are divided into two clusters based on the clustering algorithm, and clustering centers of the two clusters are calculated, where the clusters include all the gaze point feature vectors belonging to the same category, and the method includes:
a1, firstly setting the classification number of a clustering algorithm as K =2, and randomly selecting two fixation point feature vectors as two initial clustering centers from the fixation point feature set;
a2, calculating Euclidean distance between each point of regard feature vector and each initial clustering center in the point of regard feature set, and distributing the point of regard feature vector to a class corresponding to the initial clustering center with the minimum distance to obtain two class clusters, wherein the class clusters comprise all point of regard feature vectors belonging to the same class, and the class clusters are in one-to-one correspondence with the initial clustering centers;
a3, calculating the mean value of all the characteristic vectors of the fixation points in the same cluster to obtain the mean value center of each cluster, taking the mean value center as the updated initial cluster center of the corresponding cluster to finish the updating of the initial cluster center, wherein each cluster corresponds to one initial cluster center before updating and one updated initial cluster center;
a4, calculating Euclidean distances between the updated initial clustering center and the initial clustering center before updating in each cluster as deviation distances, calculating the sum of the deviation distances of all clusters as a total deviation distance, and if the total deviation distance is greater than a preset threshold value, repeatedly executing the step A2 and the step A3 to continuously update the initial clustering centers; if the sum of the distances is smaller than a preset threshold value, stopping updating, and taking an updated initial clustering center obtained in the last updating process as a clustering center; because the classification number K =2 of the clustering algorithm, two clustering centers and a class cluster corresponding to each clustering center can be obtained, and the clustering centers reflect the average characteristics of each class. Wherein the preset threshold value is 0.5.
In this optional embodiment, the number of injection point feature vectors in each cluster is counted, and since the fraud risk in the actual scene is a few cases, the cluster with the smaller number is used as an abnormal cluster with fraud risk, and the cluster center of the abnormal cluster is used as an abnormal cluster center; and taking the cluster with the larger number as a normal cluster without fraud risk, and taking the cluster center of the normal cluster as a normal cluster center.
Therefore, the abnormal clustering centers corresponding to the fraud risk and the normal clustering centers corresponding to the fraud risk are obtained based on the historical data and the clustering algorithm, unsupervised classification of all the fixation point feature vectors in the historical data is realized, and the identification error caused by artificial subjective factors is reduced.
And S14, comparing the distances between the fixation point characteristic vectors of the real-time answer videos and different clustering centers to obtain the fraud risk identification result of the target person.
In an optional embodiment, after the gaze point feature vector of the real-time answer video is obtained, an euclidean distance between the gaze point feature vector of the real-time answer video and the normal clustering center is calculated to serve as a first distance; and calculating the Euclidean distance between the fixation point characteristic vector of the real-time answer video and the abnormal clustering center as a second distance.
In this optional embodiment, the first distance and the second distance are compared to obtain a fraud risk identification result of the target person, and if the second distance is greater than the first distance, it indicates that the gaze point feature vector of the real-time answer video is close to a normal clustering center, and the fraud risk identification result of the target person is that no fraud risk exists; if the second distance is smaller than the first distance, the fact that the point of regard feature vector of the real-time answer video is close to an abnormal clustering center is represented, and the fraud risk identification result of the target person is that fraud risk exists; and if the second distance is equal to the first distance, the distance from the point of regard feature vector of the real-time answer video to the abnormal clustering center is equal to the distance from the point of regard feature vector of the real-time answer video to the normal clustering center, and in order to avoid the generation of fraud, the fraud risk identification result of the target person is recorded as the existence of fraud risk so as to remind a worker to carry out identity verification on the target person.
Therefore, the fraud risk identification result of the target person can be timely and quickly acquired based on the normal clustering center, the abnormal clustering center and the point of regard characteristic vector of the real-time answer video, and the fraud behavior is avoided.
According to the technical scheme, the real-time answer video of the person in the answer process can be collected, the point-of-regard characteristic vector of the real-time answer video is obtained based on the point-of-regard detection network, unsupervised cluster analysis is carried out on the point-of-regard characteristic vector and historical data, the fraud risk identification result of the person is accurately and rapidly obtained, and the fraud risk identification accuracy and timeliness are improved.
Referring to fig. 3, fig. 3 is a functional block diagram of a preferred embodiment of the fraud risk identification apparatus based on artificial intelligence according to the present application. The fraud risk identification device 11 based on artificial intelligence comprises an acquisition unit 110, a point of regard detection unit 111, a construction unit 112, a clustering unit 113 and an acquisition unit 114. A module/unit as referred to herein is a series of computer readable instruction segments capable of being executed by the processor 13 and performing a fixed function, and is stored in the memory 12. In the present embodiment, the functions of the modules/units will be described in detail in the following embodiments.
In an optional embodiment, the collecting unit 110 is configured to collect a real-time answer video of the target person, where the real-time answer video includes face images of consecutive frames during the answer process of the target person.
In an optional embodiment, in the process of personal loan or large amount transfer and the like, a target person firstly submits a related application, then a bank preliminarily checks related information of the target person, and after the preliminary checking is passed, an agent and the target person perform an online or offline signing conversation link. In this link, the target person needs to answer the test questions in the electronic questionnaire to test whether the target person has a fraud risk. It should be noted that the application scenario described in this alternative embodiment is only an example, and is not limited thereto.
In this optional embodiment, when the target person answers the test questions in the electronic questionnaire, the target person clicks a "start answering" button on the terminal device, at this time, the image acquisition device in the terminal device is triggered, the image acquisition device starts to acquire continuous multiframe human face images of the target person at a fixed frame rate, the answering time of each test question is fixed, when answering of all the questions is completed, the image acquisition device in the terminal device stops the acquisition of the human face images, and all the human face images acquired in the answering process of the target person are used as the real-time answering video of the target person. The terminal device can be a mobile phone, a tablet computer and the like, and the application is not limited.
In this optional embodiment, since the frame rate of the image acquisition device and the response time of the test question are both fixed and unchanged, and the test questions in the electronic questionnaire are the same in the same application scene, the number of frames of the face images included in all the acquired answer videos is the same.
In an optional embodiment, the gazing point detecting unit 111 is configured to obtain a gazing point detection network, and obtain a gazing point coordinate of each frame of face image in the real-time answer video based on the gazing point detection network.
In an optional embodiment, the obtaining a gazing point detection network and obtaining a gazing point coordinate of each frame of face image in the real-time answer video based on the gazing point detection network includes:
constructing a point of regard detection initial network, wherein the point of regard detection initial network comprises a first encoder, a second encoder and a full connection layer;
training the fixation point detection initial network according to a preset loss function to obtain a fixation point detection network, wherein the input of the fixation point detection network is a human face image, and the output of the fixation point detection network is the fixation point coordinate of the human face image;
and sequentially inputting all the face images in the real-time answer video into the fixation point detection network to obtain the fixation point coordinates of each frame of face image.
In this optional embodiment, a gaze point detection initial network is established, the input of the gaze point detection initial network is a face image in an answer video, the expected output is a gaze point coordinate of the face image, and the gaze point coordinate includes coordinate positions in two dimensions, namely, a horizontal direction x and a vertical direction y. The position of a gaze point in a face image is not only related to the detail features of an eye region, but also affected by the global features of the face region, so that the gaze point detection initial network comprises a first encoder, a second encoder and a full connection layer, wherein the first encoder is used for extracting the detail features of the eye region, and the second encoder is used for extracting the global features of the face region; the first encoder extracts the characteristics of an eye region in an input face image to obtain a first characteristic diagram, wherein the size of the first characteristic diagram is w 1 ×h 1 ×c 1 Wherein w is 1 ×h 1 Is the width and height dimension of the first feature map, c 1 Representing the number of image channels of the first feature map, the size of the first feature map being related to the structure of the first encoder; the second encoder pairExtracting the characteristics of a face region in an input face image to obtain a second characteristic image, wherein the size of the second characteristic image is w 2 ×h 2 ×c 2 Wherein w is 2 ×h 2 Is the width and height dimension of the second feature map, c 2 Representing the number of image channels of the second feature map, the size of the second feature map being related to the structure of the second encoder; all numerical values in the first characteristic diagram and the second characteristic diagram are spliced into a one-dimensional vector, the one-dimensional vector is used as a fusion vector to be input into a full connection layer, the output of the full connection layer is coordinates of a fixation point in an input face image, and a structural schematic diagram of the fixation point detection initial network is shown in fig. 2. The fully-connected layer comprises an input layer, a hidden layer and an output layer, the number of neurons in the input layer and the number of all numerical values in the fusion vector are 2, the number of the neurons in the output layer and the number of the neurons in each layer are not limited in the application; the first encoder and the second encoder may adopt the existing encoder structures such as ResNet, denseNet, etc., and the present application is not limited thereto.
In this optional embodiment, the gaze point detection initial network is trained according to a preset loss function, parameters in the gaze point detection initial network are continuously updated, the output of the gaze point detection initial network is constrained to be coordinates of a gaze point in a face image, and the gaze point detection network is obtained. The training process of the initial network for detecting the gaze point is as follows:
a1, collecting a large number of face images in historical data as training data, and acquiring label data of each face image in the training data, wherein the label data comprises an eye area label image, a face area label image and a fixation point coordinate label.
In this optional embodiment, the pixel value of the eye region in the face image is set to 1, and the pixel values of other regions are set to 0, so as to obtain an eye region label map of each face image in the training data; setting the pixel value of a face area in the face image to be 1, and setting the pixel values of other areas to be 0, so as to obtain a face area label image of each face image in the training data; and acquiring the fixation point coordinate of each face image in the training data as a fixation point coordinate label of the face image, wherein the fixation point coordinate is acquired in a manual labeling mode.
A2, acquiring a first saliency map based on a pixel value in the first feature map, wherein the pixel point in the first saliency map corresponds to the pixel point in the input face image one by one, and the pixel value of the pixel point in the first saliency map represents the saliency of the feature of the pixel point in the input face image in the first feature map; in order to enable a first encoder to extract detail features of an eye region in an input face image, a first preset loss function is constructed based on the first saliency map and the eye region label map, the first preset loss function can constrain pixel points with pixel values larger than 0.5 in the first saliency map to be located in the eye region, and the first preset loss function satisfies the relation:
wherein, I (i,j) Detecting the pixel value of a pixel point (i, j) in an eye region label image of an input face image of an initial network for the fixation point;the pixel value of the pixel point (i, j) in the first significance map has a value range of [0, 1%](ii) a W multiplied by H is the width and height of the input face image; loss1 is the value of a first preset Loss function; f (X) is a custom function, and the custom function satisfies the relation:
in this alternative embodiment, the obtaining the first saliency map based on the pixel values in the first feature map includes:
calculating the mean value of pixel values in each image channel in the first feature map; the first characteristic diagram has a size w 1 ×h 1 ×? 1 In total contain c 1 An image channel, each image channel being w 1 ×h 1 C is obtained in total 1 An average value;
normalizing all the mean values to obtain a weight factor of each image channel, and taking the image channel as the image channel c 1 The weight factor of (2) is an example, the weight factorThe calculation formula of (c) is:
wherein,is an image channel c 1 The average value of the pixel values of (a),is the pixel value mean of all image channels, the denominator is the sum of the pixel value mean of all image channels,is an image channel c 1 The weighting factor of (1);
performing weighted summation on all image channels based on the weight factors to obtain an initial significant image, wherein the size of the initial significant image is w 1 ×h 1 Equal to the size of each image channel;
and performing interpolation processing on the initial salient image based on an interpolation algorithm to obtain a first salient image, wherein the first salient image is equal to the input face image of the initial gaze point detection network in size, and the interpolation algorithm can adopt a linear interpolation algorithm.
A3, according to the same method, acquiring a second saliency map based on a pixel value in the second feature map, wherein the pixel point in the second saliency map corresponds to the pixel point in the input face image one by one, and the acquisition method of the second saliency map is the same as that of the first saliency map; in order to enable the second encoder to extract the global features of the face region in the input face image, a second preset loss function is constructed based on the second saliency map and the face region label map, the second preset loss function may constrain that pixel points with pixel values greater than 0.5 in the second saliency map are all located in the face region, and the second preset loss function satisfies the relation:
wherein,detecting the pixel value of a pixel point (i, j) in a face region label image of an input face image of an initial network for the fixation point;the pixel value of the pixel point (i, j) in the second saliency map has a value range of [0, 1%](ii) a W multiplied by H is the width and height size of the input face image, loss2 is the numerical value of a second preset Loss function, f (X) is a self-defined function, and the self-defined function meets the relation:
a4, constructing a third preset loss function based on the output result of the gaze point detection initial network and the gaze point coordinate label, wherein the third preset loss function can constrain the output of the gaze point detection initial network as the coordinates of the gaze point in the face image, and the third preset loss function satisfies the relation:
Loss3=(x * -x) 2 +(y * -y) 2
wherein x is * ,y * Detecting a gaze point coordinate tag of an input face image of an initial network for the gaze point, x, y being gaze point detectionMeasuring an output result of the initial network; loss3 is the value of the third predetermined Loss function.
And A5, adding the first preset loss function, the second preset loss function and the third preset loss function to serve as the preset loss function, training the fixation point detection initial network based on the preset loss function and the training data, continuously inputting the face images in the training data into the fixation point detection initial network to obtain the value of the preset loss function, updating network parameters by using a gradient descent method after the value of the preset loss function is obtained in each iteration, calculating the reduction value of the preset loss function in two adjacent iterations, and stopping training to obtain the fixation point detection network when the reduction value is smaller than a preset threshold value. Wherein the preset threshold is 0.001.
In this optional embodiment, all face images in the real-time answer video are sequentially input to the gaze point detection network to obtain the gaze point coordinates of each frame of face image.
In an alternative embodiment, the constructing unit 112 is configured to construct the gazing point feature vector of the real-time answer video based on the gazing point coordinates of the face image.
In an optional embodiment, a timestamp of each frame of face image in a real-time answer video of a target person is obtained, and the timestamp reflects the acquisition time of the face image; and arranging the fixation point coordinates of each frame of face image according to the sequence of the time stamps to obtain the fixation point characteristic vector of the real-time answer video.
Exemplarily, assuming that the real-time answer video of the target person includes 3 frames of face images, the fixation point coordinates of the 3 frames of face images are obtained according to the sequence of the timestamps, and are (x) in sequence 1 ,y 1 )、(x 2 ,y 2 )、(x 3 ,y 3 ) If the point-of-regard feature vector of the real-time answer video is (x) 1 ,y 1 ,x 2 ,y 2 ,x 3 ,y 3 )。
In an optional embodiment, the clustering unit 113 is configured to collect the gaze point feature vectors of all answer videos in the historical data to construct a gaze point feature set, and perform cluster analysis on the gaze point feature set based on a clustering algorithm to obtain a clustering center, where the clustering center reflects an average feature of each category.
In an optional embodiment, all answer videos in the historical data are collected, and the point-of-regard feature vector of each answer video is obtained according to the same method to construct a point-of-regard feature set; because the number of frames of the face images contained in all the answer videos is the same, the number of numerical values contained in the gazing point characteristic vectors of different answer videos is the same.
In this optional embodiment, the performing, based on a clustering algorithm, a clustering analysis on the gaze point feature set to obtain a clustering center, where the clustering center reflects an average feature of each category, includes:
dividing the gazing point characteristic vectors in the gazing point characteristic set into two clusters based on a clustering algorithm, and calculating clustering centers of the two clusters, wherein the clusters comprise all gazing point characteristic vectors belonging to the same class;
and counting the number of the injection point feature vectors in each cluster, taking the cluster center of the cluster with the smaller number as an abnormal cluster center, and taking the cluster center of the cluster with the larger number as a normal cluster center.
In this optional embodiment, all the gaze point feature vectors in the gaze point feature set may be divided into two categories, i.e., a normal feature vector without a fraud risk and an abnormal feature vector with a fraud risk, and the clustering algorithm may adopt a distance-based Kmeans algorithm, so that the gaze point feature vectors in the gaze point feature set are divided into two clusters based on the clustering algorithm, and clustering centers of the two clusters are calculated, where the clusters include all the gaze point feature vectors belonging to the same category, and the method includes:
a1, firstly setting the classification number of a clustering algorithm as K =2, and randomly selecting two fixation point feature vectors as two initial clustering centers from the fixation point feature set;
a2, calculating Euclidean distance between each point of regard feature vector and each initial clustering center in the point of regard feature set, and distributing the point of regard feature vector to a class corresponding to the initial clustering center with the minimum distance to obtain two class clusters, wherein the class clusters comprise all point of regard feature vectors belonging to the same class, and the class clusters are in one-to-one correspondence with the initial clustering centers;
a3, calculating the mean value of all the gaze point feature vectors in the same cluster to obtain the mean value center of each cluster, taking the mean value center as the updated initial clustering center of the corresponding cluster to complete the updating of the initial clustering center, wherein each cluster corresponds to one initial clustering center before updating and one initial clustering center after updating;
a4, calculating Euclidean distance between the updated initial clustering center and the initial clustering center before updating in each cluster as a deviation distance, calculating the sum of the deviation distances of all clusters as a total deviation distance, and if the total deviation distance is greater than a preset threshold value, repeatedly executing the step A2 and the step A3 to continuously update the initial clustering center; if the sum of the distances is smaller than a preset threshold value, stopping updating, and taking an updated initial clustering center obtained in the last updating process as a clustering center; since the classification number K =2 of the clustering algorithm, two clustering centers and a class cluster corresponding to each clustering center can be obtained, and the clustering centers reflect the average characteristics of each class. Wherein the preset threshold value is 0.5.
In this optional embodiment, the number of injection point feature vectors in each cluster is counted, and since the fraud risk in the actual scene is a few cases, the cluster with the smaller number is used as an abnormal cluster with fraud risk, and the cluster center of the abnormal cluster is used as an abnormal cluster center; and taking the cluster with the larger number as a normal cluster without fraud risk, and taking the cluster center of the normal cluster as a normal cluster center.
In an optional embodiment, the obtaining unit 114 compares distances between the gazing point feature vector of the real-time answer video and different clustering centers to obtain a fraud risk identification result of the target person.
In an optional embodiment, after the gaze point feature vector of the real-time answer video is obtained, an euclidean distance between the gaze point feature vector of the real-time answer video and the normal clustering center is calculated to serve as a first distance; and calculating the Euclidean distance between the gaze point characteristic vector of the real-time answer video and the abnormal clustering center to serve as a second distance.
In this optional embodiment, the first distance and the second distance are compared to obtain a fraud risk identification result of the target person, and if the second distance is greater than the first distance, it indicates that the gaze point feature vector of the real-time answer video is close to a normal clustering center, and the fraud risk identification result of the target person is that no fraud risk exists; if the second distance is smaller than the first distance, the fact that the gazing point characteristic vector of the real-time answer video is close to an abnormal clustering center is indicated, and the fraud risk identification result of the target person is that fraud risk exists; and if the second distance is equal to the first distance, the distance from the point of regard feature vector of the real-time answer video to the abnormal clustering center is equal to the distance from the point of regard feature vector of the real-time answer video to the normal clustering center, and in order to avoid the generation of fraud, the fraud risk identification result of the target person is recorded as the existence of fraud risk so as to remind a worker to carry out identity verification on the target person.
According to the technical scheme, the real-time answer video of the person in the answer process can be collected, the point-of-regard characteristic vector of the real-time answer video is obtained based on the point-of-regard detection network, unsupervised cluster analysis is conducted on the point-of-regard characteristic vector and historical data, the fraud risk identification result of the person is accurately and quickly obtained, and the accuracy and the timeliness of fraud risk identification are improved.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 1 comprises a memory 12 and a processor 13. The memory 12 is used for storing computer readable instructions, and the processor 13 is used for executing the computer readable instructions stored in the memory to implement the artificial intelligence based fraud risk identification method according to any of the above embodiments.
In an alternative embodiment, the electronic device 1 further comprises a bus, a computer program stored in said memory 12 and executable on said processor 13, such as an artificial intelligence based fraud risk identification program.
Fig. 4 only shows the electronic device 1 with the memory 12 and the processor 13, and it will be understood by those skilled in the art that the structure shown in fig. 4 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
In connection with fig. 1, the memory 12 in the electronic device 1 stores a plurality of computer-readable instructions to implement an artificial intelligence based fraud risk identification method, and the processor 13 executes the plurality of instructions to implement:
acquiring a real-time answer video of a target person, wherein the real-time answer video comprises continuous multiframe face images in the answer process of the target person;
acquiring a fixation point detection network, and acquiring fixation point coordinates of each frame of face image in the real-time answer video based on the fixation point detection network;
constructing a fixation point characteristic vector of the real-time answer video based on the fixation point coordinates of the face image;
collecting the fixation point characteristic vectors of all answer videos in historical data to construct a fixation point characteristic set, and carrying out cluster analysis on the fixation point characteristic set based on a clustering algorithm to obtain a clustering center, wherein the clustering center reflects the average characteristic of each category;
and comparing the distances between the gazing point characteristic vectors of the real-time answer videos and different clustering centers to obtain the fraud risk identification result of the target person.
Specifically, the processor 13 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the instruction, which is not described herein again.
It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the electronic device 1, and does not constitute a limitation to the electronic device 1, the electronic device 1 may have a bus-type structure or a star-shaped structure, and the electronic device 1 may further include more or less hardware or software than that shown in the figure, or different component arrangements, for example, the electronic device 1 may further include an input and output device, a network access device, and the like.
It should be noted that the electronic device 1 is only an example, and other existing or future electronic products, such as may be adapted to the present application, should also be included in the scope of protection of the present application, and is included by reference.
The processor 13 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 13 is a Control Unit (Control Unit) of the electronic device 1, connects various components of the whole electronic device 1 by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (for example, executing an artificial intelligence based fraud risk identification program and the like) stored in the memory 12 and calling data stored in the memory 12.
The processor 13 executes an operating system of the electronic device 1 and various installed application programs. The processor 13 executes the application program to implement the steps of the above-described embodiments of artificial intelligence based fraud risk identification methods, such as the steps shown in fig. 1.
Illustratively, the computer program may be partitioned into one or more modules/units, which are stored in the memory 12 and executed by the processor 13 to accomplish the present application. The one or more modules/units may be a series of computer-readable instruction segments capable of performing certain functions, which are used to describe the execution of the computer program in the electronic device 1. For example, the computer program may be divided into an acquisition unit 110, a point of regard detection unit 111, a construction unit 112, a clustering unit 113, an acquisition unit 114.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device) or a Processor (Processor) to execute the parts of the artificial intelligence based fraud risk identification method according to the embodiments of the present application.
The integrated modules/units of the electronic device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the processes in the methods of the embodiments described above may be implemented by a computer program, which may be stored in a computer-readable storage medium and executed by a processor, to implement the steps of the embodiments of the methods described above.
Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random-access Memory and other Memory, etc.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one arrow is shown in FIG. 4, but this does not indicate only one bus or one type of bus. The bus is arranged to enable connection communication between the memory 12 and at least one processor 13 or the like.
Embodiments of the present application further provide a computer-readable storage medium (not shown), in which computer-readable instructions are stored, and the computer-readable instructions are executed by a processor in an electronic device to implement the artificial intelligence based fraud risk identification method according to any of the above embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the specification may also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present application and not for limiting, and although the present application is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made to the technical solutions of the present application without departing from the spirit and scope of the technical solutions of the present application.
Claims (10)
1. An artificial intelligence based fraud risk identification method, characterized in that the method comprises:
acquiring a real-time answer video of a target person, wherein the real-time answer video comprises face images of continuous multiple frames in the answer process of the target person;
acquiring a fixation point detection network, and acquiring fixation point coordinates of each frame of face image in the real-time answer video based on the fixation point detection network;
constructing a fixation point characteristic vector of the real-time answer video based on the fixation point coordinates of the face image;
collecting the fixation point characteristic vectors of all answer videos in historical data to construct a fixation point characteristic set, and carrying out cluster analysis on the fixation point characteristic set based on a clustering algorithm to obtain a clustering center, wherein the clustering center reflects the average characteristic of each category;
and comparing the distances between the fixation point characteristic vectors of the real-time answer video and different clustering centers to obtain the fraud risk identification result of the target person.
2. The artificial intelligence based fraud risk identification method of claim 1, wherein the obtaining a point-of-regard detection network and obtaining a point-of-regard coordinate of each frame of face image in the real-time answer video based on the point-of-regard detection network comprises:
constructing a point of regard detection initial network, wherein the point of regard detection initial network comprises a first encoder, a second encoder and a full connection layer;
training the fixation point detection initial network according to a preset loss function to obtain a fixation point detection network, wherein the input of the fixation point detection network is a human face image, and the output of the fixation point detection network is the fixation point coordinate of the human face image;
and sequentially inputting all face images in the real-time answer video into the fixation point detection network to obtain the fixation point coordinates of each frame of face image.
3. The artificial intelligence based fraud risk recognition method of claim 2, wherein the training the initial gaze point detection network according to a preset loss function to obtain a gaze point detection network comprises:
a1, collecting a large number of face images in historical data as training data, and acquiring label data of each face image in the training data, wherein the label data comprises an eye area label image, a face area label image and a fixation point coordinate label;
a2, acquiring a first saliency map based on pixel values in a first feature map, and constructing a first preset loss function based on the first saliency map and the eye region label map, where the first feature map is an output result of the first encoder, and the first preset loss function satisfies a relation:
wherein, I (i,j) Detecting the pixel value of a pixel point (i, j) in an eye region label image of an input face image of an initial network for the fixation point;the pixel value of the pixel point (i, j) in the first significance map has a value range of [0, 1%](ii) a W multiplied by H is the width and height of the input face image; loss1 is the value of a first preset Loss function; f (X) is a custom function, and the custom function satisfies the relation:
a3, obtaining a second saliency map based on pixel values in a second feature map, and constructing a second preset loss function based on the second saliency map and the face region label map, where the second feature map is an output result of the second encoder, and the second preset loss function satisfies a relation:
wherein,detecting the pixel value of a pixel point (i, j) in a face region label image of an input face image of the initial network for the fixation point;the pixel value of the pixel point (i, j) in the second saliency map has a value range of [0, 1%](ii) a W multiplied by H is the width and height size of the input face image, loss2 is the numerical value of a second preset Loss function, f (X) is a self-defined function, and the self-defined function meets the relation:
a4, constructing a third preset loss function based on the output result of the gaze point detection initial network and the gaze point coordinate label, wherein the third preset loss function satisfies the relation:
Loss3=(x * -x) 2 +(y * -y) 2
wherein x is * ,y * A fixation point coordinate label of an input face image of the fixation point detection initial network is used, x and y are output results of the fixation point detection initial network, and Loss3 is a numerical value of a third preset Loss function;
and A5, adding the first preset loss function, the second preset loss function and the third preset loss function to serve as the preset loss function, and training the initial network for detecting the fixation point based on the preset loss function and the training data to obtain the network for detecting the fixation point.
4. The artificial intelligence based fraud risk recognition method of claim 3, wherein the collecting a large number of face images in the historical data as training data, and obtaining label data of each face image in the training data, the label data including an eye area label map, a face area label map and a gazing point coordinate label, comprises:
setting the pixel value of the eye region in the face image to be 1, and setting the pixel values of other regions to be 0, so as to obtain an eye region label image of each face image in the training data;
setting the pixel value of a face area in the face image to be 1, and setting the pixel values of other areas to be 0, so as to obtain a face area label image of each face image in the training data;
and acquiring the fixation point coordinate of each face image in the training data as the fixation point coordinate label of the face image.
5. The artificial intelligence based fraud risk identification method of claim 3 wherein said obtaining a first saliency map based on pixel values in a first feature map comprises:
calculating the mean value of pixel values in each image channel in the first feature map;
normalizing all the mean values to obtain a weight factor of each image channel, wherein the weight factors satisfy the following relation:
wherein,is an image channel c 1 The average value of the pixel values of (a),is the pixel value mean of all image channels, the denominator is the sum of the pixel value mean of all image channels,is an image channel c 1 The weighting factor of (1);
performing weighted summation on all image channels based on the weighting factors to obtain an initial significant image, wherein the size of the initial significant image is equal to that of each image channel;
and carrying out interpolation processing on the initial salient image based on an interpolation algorithm to obtain a first salient image, wherein the first salient image is equal to the input face image of the initial gaze point detection network in size.
6. The artificial intelligence based fraud risk identification method of claim 1, wherein the clustering algorithm performs clustering analysis on the point-of-regard feature set to obtain a cluster center, the cluster center reflecting an average feature of each category, comprising:
dividing the gazing point characteristic vectors in the gazing point characteristic set into two clusters based on a clustering algorithm, and calculating clustering centers of the two clusters, wherein the clusters comprise all gazing point characteristic vectors belonging to the same class;
and counting the number of the injection point feature vectors in each cluster, taking the cluster center of the cluster with the smaller number as an abnormal cluster center, and taking the cluster center of the cluster with the larger number as a normal cluster center.
7. The artificial intelligence based fraud risk recognition method of claim 1, wherein the comparing distances of the gazing point feature vector of the real-time answer video to different cluster centers to obtain the fraud risk recognition result of the target person comprises:
calculating Euclidean distance between the fixation point characteristic vector of the real-time answer video and the normal clustering center as a first distance;
calculating Euclidean distance between the fixation point characteristic vector of the real-time answer video and the abnormal clustering center as a second distance;
comparing the first distance with the second distance to obtain a fraud risk identification result of the target person, wherein if the second distance is greater than the first distance, the fraud risk identification result of the target person is that no fraud risk exists; and if the second distance is not greater than the first distance, the fraud risk identification result of the target person is that fraud risk exists.
8. An artificial intelligence based fraud risk identification apparatus, characterized in that the apparatus comprises:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a real-time answer video of a target person, and the real-time answer video comprises continuous multiframe face images in the answer process of the target person;
the point of regard detection unit is used for obtaining a point of regard detection network and obtaining the point of regard coordinate of each frame of face image in the real-time answer video based on the point of regard detection network;
the construction unit is used for constructing the fixation point characteristic vector of the real-time answer video based on the fixation point coordinates of the face image;
the system comprises a clustering unit, a processing unit and a processing unit, wherein the clustering unit is used for collecting the fixation point characteristic vectors of all answer videos in historical data to construct a fixation point characteristic set, and carrying out clustering analysis on the fixation point characteristic set based on a clustering algorithm to obtain a clustering center, and the clustering center reflects the average characteristic of each category;
and the obtaining unit is used for comparing the distances between the gazing point characteristic vectors of the real-time answer videos and different clustering centers to obtain the fraud risk recognition result of the target person.
9. An electronic device, characterized in that the electronic device comprises:
a memory storing computer readable instructions; and
a processor executing computer readable instructions stored in the memory to implement the artificial intelligence based fraud risk identification method of any of claims 1-7.
10. A computer-readable storage medium having computer-readable instructions stored thereon which, when executed by a processor, implement the artificial intelligence based fraud risk identification method of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210902732.4A CN115222427A (en) | 2022-07-29 | 2022-07-29 | Artificial intelligence-based fraud risk identification method and related equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210902732.4A CN115222427A (en) | 2022-07-29 | 2022-07-29 | Artificial intelligence-based fraud risk identification method and related equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115222427A true CN115222427A (en) | 2022-10-21 |
Family
ID=83613550
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210902732.4A Pending CN115222427A (en) | 2022-07-29 | 2022-07-29 | Artificial intelligence-based fraud risk identification method and related equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115222427A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116825169A (en) * | 2023-08-31 | 2023-09-29 | 悦芯科技股份有限公司 | Abnormal memory chip detection method based on test equipment |
CN117649157A (en) * | 2024-01-30 | 2024-03-05 | 中国人民解放军空军军医大学 | Instrument discrimination capability assessment method based on sight tracking |
CN118260789A (en) * | 2024-05-30 | 2024-06-28 | 江苏西欧电子有限公司 | Electric energy meter data storage method and system based on data analysis |
-
2022
- 2022-07-29 CN CN202210902732.4A patent/CN115222427A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116825169A (en) * | 2023-08-31 | 2023-09-29 | 悦芯科技股份有限公司 | Abnormal memory chip detection method based on test equipment |
CN116825169B (en) * | 2023-08-31 | 2023-11-24 | 悦芯科技股份有限公司 | Abnormal memory chip detection method based on test equipment |
CN117649157A (en) * | 2024-01-30 | 2024-03-05 | 中国人民解放军空军军医大学 | Instrument discrimination capability assessment method based on sight tracking |
CN117649157B (en) * | 2024-01-30 | 2024-03-29 | 中国人民解放军空军军医大学 | Instrument discrimination capability assessment method based on sight tracking |
CN118260789A (en) * | 2024-05-30 | 2024-06-28 | 江苏西欧电子有限公司 | Electric energy meter data storage method and system based on data analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108197532B (en) | The method, apparatus and computer installation of recognition of face | |
CN115222427A (en) | Artificial intelligence-based fraud risk identification method and related equipment | |
CN110222554A (en) | Cheat recognition methods, device, electronic equipment and storage medium | |
CN107742100B (en) | A kind of examinee's auth method and terminal device | |
CN112541443B (en) | Invoice information extraction method, invoice information extraction device, computer equipment and storage medium | |
CN110502694A (en) | Lawyer's recommended method and relevant device based on big data analysis | |
CN115063632B (en) | Vehicle damage identification method, device, equipment and medium based on artificial intelligence | |
CN115063589A (en) | Knowledge distillation-based vehicle component segmentation method and related equipment | |
CN112668453B (en) | Video identification method and related equipment | |
CN112132030A (en) | Video processing method and device, storage medium and electronic equipment | |
CN115409638A (en) | Artificial intelligence-based livestock insurance underwriting and claim settlement method and related equipment | |
CN114860742A (en) | Artificial intelligence-based AI customer service interaction method, device, equipment and medium | |
CN113269179B (en) | Data processing method, device, equipment and storage medium | |
CN114639152A (en) | Multi-modal voice interaction method, device, equipment and medium based on face recognition | |
CN109801394B (en) | Staff attendance checking method and device, electronic equipment and readable storage medium | |
CN116205723A (en) | Artificial intelligence-based face tag risk detection method and related equipment | |
CN113284137B (en) | Paper fold detection method, device, equipment and storage medium | |
CN116543460A (en) | Space-time action recognition method based on artificial intelligence and related equipment | |
CN116012952A (en) | Behavior recognition method based on artificial intelligence and related equipment | |
CN110489438A (en) | A kind of customer action information processing method and device | |
CN115169360A (en) | User intention identification method based on artificial intelligence and related equipment | |
CN114627533A (en) | Face recognition method, face recognition device, face recognition equipment and computer-readable storage medium | |
CN114757729A (en) | Transaction request processing method and device, terminal equipment and storage medium | |
CN113920564A (en) | Client mining method based on artificial intelligence and related equipment | |
CN114820409A (en) | Image anomaly detection method and device, electronic device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |