WO2022081553A4

WO2022081553A4 - Systems and methods for providing a systemic error in artificial intelligence algorithms

Info

Publication number: WO2022081553A4
Application number: PCT/US2021/054542
Authority: WO
Inventors: Gharib GHARIBI; Babak Poorebrahim GILKALAYE; Riddhiman Das
Original assignee: TripleBlind, Inc.
Priority date: 2020-10-13
Filing date: 2021-10-12
Publication date: 2022-06-16
Also published as: EP4229554A1; EP4229554A4; CA3195434A1; WO2022081553A1

Abstract

Disclosed is a process for testing a suspect model to determine whether it was derived from a source model. An example method includes receiving, from a model owner node, a source model and a fingerprint associated with the source model, receiving a suspect model at a service node, based on a request to test the suspect model, applying the fingerprint to the suspect model to generate an output and, when the output has an accuracy that is equal to or greater than a threshold, determining that the suspect model is derived from the source model. Imperceptible noise can be used to generate the fingerprint which can cause predictable outputs from the source model and a potential derivative thereof.

Claims

AMENDED CLAIMS received by the International Bureau on 02 MAY 2022 (02.05.2022)

1. (Original) A method comprising: generating, based on a training dataset, a reference model and a surrogate model; selecting datapoints from the training dataset that are predicted correctly by a source model as a group of adversarial candidates; selecting, from the group of adversarial candidates, a sub-group of candidates that each have a low confidence score according to a threshold to yield a sub-group of adversarial candidates; adding noise to each candidate of the sub-group of adversarial candidates to yield a noisy group of adversarial examples; testing the noisy group of adversarial examples against the source model to obtain a set of source model outputs that the source model predicts correctly; testing the set of source model successful adversarial examples against the reference model to yield a set of reference model outputs; testing the set of source model successful adversarial examples against the surrogate model to yield a set of surrogate model outputs; and identifying a set of fingerprints based on which ones from the set of source model outputs, the set of reference model outputs and the set of surrogate model outputs pass as adversarial examples against the source model and the surrogate model, but not the reference model.

2. (Original) The method of claim 1, wherein the training dataset is from a same distribution of a source model dataset.

3. (Original) The method of claim 1, wherein the training dataset comprises at least some data from a source model dataset.

38

AMENDED SHEET (ARTICLE 19) Application/ Control Number: PCT/US2021/054542 Docket No.: 213-0106-PCT

4. (Original) The method of claim 1, wherein fingerprint candidates comprise ones of the noisy group of adversarial examples that lead to a fully successful adversarial attack accuracy against the source model.

5. (Original) The method of claim 1, further comprising: sharing a hashed version of the set of fingerprints with a trusted third party.

6. (Original) The method of claim 1, wherein generating the set of fingerprints further comprises constructing respective adversarial examples with the noise that causes a receiving model to misclassify an input.

7. (Original) The method of claim 6, wherein the noise is imperceptible noise.

8. (Original) The method of claim 1, further comprising: testing a suspect model by using the set of fingerprints against the suspect model to determine whether an overall accuracy operating on the set of fingerprints is equal to or greater than a testing threshold.

9. (Original) The method of claim 8, further comprising: determining, when the overall accuracy operating on the set of fingerprints is equal to or greater than the testing threshold, that the suspect model was derived from the source model.

10. (Original) A method comprising: receiving, from a model owner node, a source model and verification key at a service node; receiving a suspect model at the service node; transmitting a request to the model owner node for a proof of ownership relative to the suspect model; in response to the request, receiving a marking key at the service node from the model owner node; and

39

AMENDED SHEET (ARTICLE 19) Application/ Control Number: PCT/US2021/054542 Docket No.: 213-0106-PCT based on the marking key and the verification key, determining whether the suspect model was derived from the source model.

11. (Original) The method of claim 10, wherein determining whether the suspect model was derived from the source model further comprises testing the suspect model to determine whether a fingerprint produces a same output from both the source model and the suspect model.

12. (Original) The method of claim 11, wherein the fingerprint passes against the source model and a surrogate model, but not a reference model.

13. (Original) The method of claim 10, wherein at least one of the marking key and the verification key comprises added noise which causes a predictable output from the source model and surrogate models derived therefrom.

14. (Currently Amended) A method comprising: receiving, from a model owner node, a source model and a fingerprint associated with the source model; receiving a suspect model at a service node; based on a request to test the suspect model, applying the fingerprint to the suspect model to generate an output; and when the output has an accuracy that is equal to or greater than a threshold, determining that the suspect model is derived from the source model, wherein the fingerprint is generated by a process comprising: generating, based on a training dataset, a reference model and a surrogate model; selecting datapoints from the training dataset that are predicted correctly by the source model as a group of adversarial candidates; selecting from the group of adversarial candidates a sub-group of candidates that each have a low confidence score according to a threshold to yield a sub-group of adversarial candidates;

40

AMENDED SHEET (ARTICLE 19) Application/ Control Number: PCT/US2021/054542 Docket No.: 213-0106-PCT adding noise to each candidate of the sub-group of adversarial candidates to yield a noisy group of adversarial examples; testing the noisy group of adversarial examples against the source model to obtain a set of source model successful adversarial examples that the source model predicts correctly; testing the set of source model successful adversarial examples against the reference model to yield a set of reference model outputs; testing the set of source model successful adversarial examples against the surrogate model to yield a set of surrogate model outputs; and identifying the fingerprint based on which ones from the set of source model successful adversarial examples, the set of reference model outputs and the set of surrogate model outputs pass as adversarial examples against the source model and the surrogate model, but not the reference model.

15. (Cancelled) The method of claim 14, wherein the fingerprint is generated by a process comprising: generating, based on a training dataset, a reference model and a surrogate model; selecting datapoints from the training dataset that are predicted correctly by the source model as a group of adversarial candidates; selecting from the group of adversarial candidates a sub-group of candidates that each have a low confidence score according to a threshold to yield a sub-group of adversarial candidates; adding noise to each candidate of the sub-group of adversarial candidates to yield a noisy group of adversarial examples; testing the noisy group of adversarial examples against the source model to obtain a set of source model successful adversarial examples that the source model predicts correctly; testing the set of source model successful adversarial examples against the reference model to yield a set of reference model outputs;

41

AMENDED SHEET (ARTICLE 19) Application/ Control Number: PCT/US2021/054542 Docket No.: 213-0106-PCT testing the set of source model successful adversarial examples against the surrogate model to yield a set of surrogate model outputs; and identifying the fingerprint based on which ones from the set of source model successful adversarial examples, the set of reference model outputs and the set of surrogate model outputs pass as adversarial examples against the source model and the surrogate model, but not the reference model.

16. (Currently Amended) The method of claim 14, wherein the noisy group of adversarial examples comprises ones of the group of adversarial candidates that lead to a fully successful adversarial attack accuracy against the source model.

17. (Currently Amended) The method of claim 14, further comprising: sharing a hashed version of the fingerprint with a trusted third party.

18. (Currently Amended) The method of claim 14, wherein generating the fingerprint further comprises constructing respective adversarial examples with the noise that causes a receiving model to misclassify an input.

19. (Currently Amended) The method of claim 18, wherein the noise is imperceptible noise.

20. (Currently Amended) The method of claim 14, wherein the threshold is approximately 0.60.

AMENDED SHEET (ARTICLE 19)