The Computer Science (CS) field of Cryptography is currently one of the most hyped topic in the Software industry. And there are good reasons for that as the subject is an important one in a world increasingly connected, with connectivity becoming more and more important. The Internet and the paradigm of connected devices is central to these developments. But security with data quality becomes a critical issue also. With the amount of connected devices expected to increase massively in the coming years, integrity of information flow through the web will be massively important. No wonder Cryptography is becoming extremely critical for all this.
The other important Computer Science topic that is becoming critical is Machine Learning, as this site already knows a about, with the posts on Big Data and Deep Learning – always ready to know a bit more of… -, and up until now these two topics seemed to live in islands apart with no obvious connection between the two. My chosen review paper of today manages to bridge that apparent gap, with a fascinating research that brings forth other interesting and critical current CS topic – that of Application Programming Interfaces (APIs), that are chunks of specialised code designed to provide protocols of software integration and connect different parts of software stack that operate on different machines with diverse purposes. And that is precisely what is needed when it comes to integrate Machine Learning (ML) algorithms with Cryptographic protocols. But this paper is one, or maybe several steps ahead in this game, as the authors investigate how to protect sensitive data from what is called ML-as-a-service or “predictive analytics” attacks, a predictive API attacks:
Machine learning (ML) models may be deemed confidential due to their sensitive training data, commercial value, or use in security applications. Increasingly often, confidential ML models are being deployed with publicly accessible query interfaces. ML-as-a-service (“predictive analytics”) systems are an example: Some allow users to train models on potentially sensitive data and charge others for access on a pay-per-query basis.
The tension between model confidentiality and public access motivates our investigation of model extraction attacks. In such attacks, an adversary with black-box access, but no prior knowledge of an ML model’s parameters or training data, aims to duplicate the functionality of (i.e., “steal”) the model. Unlike in classical learning theory settings, ML-as-a-service offerings may accept partial feature vectors as inputs and include confidence values with predictions. Given these practices, we show simple, efficient attacks that extract target ML models with near-perfect fidelity for popular model classes including logistic regression, neural networks, and decision trees. We demonstrate these attacks against the online services of BigML and Amazon Machine Learning. We further show that the natural countermeasure of omitting confidence values from model outputs still admits potentially harmful model extraction attacks. Our results highlight the need for careful ML model deployment and new model extraction countermeasures.
The important issue of how to protect the ML model and its training data from these attacks highlight need for careful ML model deployment and the need to put extra effort and focus on model extraction countermeasures. All in a background of increasing connected data in an off premises cloud environment. A timely and worthy paper, very important to all IT and data professionals.
I will certainly come back to the topic of Machine Learning and Cryptography on the paper review of this Blog.
Feature Image: Cryptography article from Wikipedia