DIREC-projekt

Privacy and Machine Learning

Resumé

Der er et uopfyldt behov for decentraliseret privatlivsbevarende maskinlæring. Cloud computing har stort potentiale, men der mangler tillid til tjenesteudbyderne, og der er risiko for databrud. Meget data er private og gemt lokalt af gode grunde, men en kombination af informationen i et globalt maskinlæringssystem kunne føre til tjenester, der gavner alle. For eksempel kan man forestille sig et konsortium af banker, der ønsker at forbedre svindelopsporing ved at samle deres kunders betalingsdata og kombinere disse med data fra f.eks. Danmarks Statistik. Af konkurrencemæssige årsager vil bankerne dog holde deres kunders data hemmelige, og Danmarks Statistik har ikke tilladelse til at dele de nødvendige følsomme data. Et andet eksempel er patientoplysninger (f.eks. medicinske billeder), der er gemt på hospitaler. Det ville være fantastisk at bygge diagnostiske og prognostiske værktøjer ved hjælp af maskinlæring baseret på disse data, men dataene kan typisk ikke deles.

Projektperiode: 2020-2024
Budget: 4,7 millioner kr

Mere om projektet (på engelsk)

The research aim of the project is the development of AI methods and tools that enable industry to develop new solutions for automated image-based quality assessment. End-to-end learning of features and representations for object classification by deep neural networks can lead to significant performance improvements. Several recent mechanisms have been developed for further improving performance and reducing the need for manual annotation work (labelling) including semi-supervised learning strategies and data augmentation.

Semi-supervised learning combines generative models that are trained without labels (unsupervised learning), application of pre-trained networks (transfer learning) with supervised learning on small sets of labelled data. Data augmentation employs both knowledge-based transformations, such as translations and rotations and more general learned transformations like parameterised “warps” to increase variability in the training data and increase robustness to natural variation.

Value Creation

Researching secure use of sensitive data will benefit society at large. CoED-based ML solves the fundamental problem of keeping private input data private while still enabling the use of the most applied analytical tools. The CoED privacy-preserving technology reduces the risk of data breaches. It allows for secure use of cloud computing, with no single point of failure, and removes the fundamental cloud security problem of missing trust in service providers.

The project will bring together leading experts in CoED and ML. It may serve as a starting point for attracting additional national and international funding, and it will build up competences highly relevant for Danish industry. The concepts developed in the project may change how organisations collaborate and allow for innovative ways of using data, which can increase the competitiveness of Danish companies relative to large international players.

Værdi

Forskning i CoED-baseret maskinlæring til sikker anvendelse af følsomme data vil forbedre databeskyttelsen, øge sikkerheden i cloud computing, fremme ekspert-samarbejde, tiltrække finansiering og styrke konkurrenceevnen for danske virksomheder.