Project type: SCITECH Project

Privacy and Machine Learning

There is an unmet need for decentralised privacy-preserving machine learning. Cloud computing has great potential, however, there is a lack of trust in the service  providers and there is a risk of data breaches. A lot of data are private and stored locally for good reasons, but combining the information in a global machine learning (ML) system could lead to services that benefit all. For instance, consider a consortium of banks that want to improve fraud detection by pooling their customers’ payment data and merge these with data from, e.g., Statistics Denmark. However, for competitive reasons the banks want to keep their customers’ data secret and Statistics Denmark is not allowed to share the required sensitive data. As another example, consider patient information (e.g., medical images) stored at hospitals. It would be great to build diagnostic and prognostic tools using ML based on these data, however, the data can typically not be shared.

The research aim of the project is the development of AI methods and tools that enable industry to develop new solutions for automated image-based quality assessment. End-to-end learning of features and representations for object classification by deep neural networks can lead to significant performance improvements. Several recent mechanisms have been developed for further improving performance and reducing the need for manual annotation work (labelling) including semi-supervised learning strategies and data augmentation.

Semi-supervised learning  combines generative models that are trained without labels (unsupervised learning), application of pre-trained networks (transfer learning) with supervised learning on small sets of labelled data. Data augmentation employs both knowledge based transformations, such as translations and rotations and more general learned transformations like parameterised “warps” to increase variability in the training data and increase robustness to natural variation.

Researching secure use of sensitive data will benefit society at large. CoED-based ML solves the fundamental problem of keeping private input data private while still enabling the use of the most applied analytical tools. The CoED privacy-preserving technology reduces the risk of data breaches. It allows for secure use of cloud computing, with no single point of failure, and removes the fundamental cloud security problem of missing trust in service providers.

The project will bring together leading experts in CoED and ML. It may serve as a starting point for attracting additional national and international funding, and it will build up competences highly relevant for Danish industry. The concepts developed in the project may change how organisations collaborate and allow for innovative ways of using data, which can increase the competitiveness of Danish companies relative to large international players.

October 1, 2020 – September 31, 2024 – 3,5 years.

Total budget DKK 4,7 / DIREC investment DKK 3,22

Participants

Project Manager

Peter Scholl

Assistant Professor

Aarhus University
Department of Computer Science

E: peter.scholl@cs.au.dk

Ivan Bjerre Damgaard

Professor

Aarhus University
Department of Computer Science

Christian Igel

Professor

University of Copenhagen
Department of Computer Science

Kurt Nielsen

Associate Professor

University of Copenhagen
Department of Food and Resource Economics

Rahul Rachuri

PhD Student

Aarhus University
Department of Computer Science

Hiraku Morita

Post Doc

University of Copenhagen
Department of Computer Science

Partners