Categories
News

DIREC Seminar 2022

DIREC Seminar 2022

26 – 27 SEPTEMBER 2022
HELNAN HOTEL MARSELIS – AARHUS

Thank you so much to all who participated in this year’s DIREC seminar – we hope to see you again next year, where will take it one step further

Two fantastic days with a focus on digital technologies and computer science are over. Thanks to everyone who helped make the days a success.

We will currently upload the presentations from the day below.

Monday 26 September

Software Research: Impact and Challenges

Abstract: Software is an essential, yet invisible, driving force of the present world. There is, however, a striking contrast between, on the one hand, the omnipresence of software in our society and, on the other hand, the extraordinary difficulty to guarantee the correctness, reliability, performance, scalability, safety, and sustainability of modern software systems. There is an urgent need for software engineering innovations: the world of software is a moving target, due to the ever‐increasing size and complexity of software, the technological churn of both hardware and software, the increased heterogeneity of software, and the emergence of new societal and technological challenges. Fostering such innovations requires fundamental software research, independently of specific application. In this talk I will outline the major challenges in software research, what is needed to address these challenges, and the expected impact on software in our society.

Bio: Marieke Huisman is a professor in Software Reliability at the University of Twente. She is well-known for her work on program verification of concurrent software. In 2011, she obtained an ERC Starting Grant, which she used to start development of the VerCors verifier, a tool for the verification of concurrent software. Currently, as part of her NWO personal VICI grant Mercedes, she is working on further improving verification techniques, both by enabling the verification of a larger class of properties, and by making verification more automatic. Since 2019 she is SC chair of ETAPS. Besides her scientific work, she also actively works on topics related to diversity, equity and inclusion, as well as science policy. She is a member of the executive board of VERSEN, the Dutch assocation of software researchers, and chaired this association from 2018 until 2021. She is also a member of the  round table computer science of the Dutch Research Council.

Mosaics of Big Data

Database Systems and Information Management – Trends and a Vision

Abstract: The global database research community has greatly impacted the functionality and performance of data storage and processing systems along the dimensions that define “big data”, i.e., volume, velocity, variety, and veracity. Locally, over the past five years, we have also been working on varying fronts. Among our contributions are: (1) establishing a vision for a database-inspired big data analytics system, which unifies the best of database and distributed systems technologies, and augments it with concepts drawn from compilers (e.g., iterations) and data stream processing, as well as (2) forming a community of researchers and institutions to create the Stratosphere platform to realize our vision. One major result from these activities was Apache Flink, an open-source big data analytics platform and its thriving global community of developers and production users.

Although much progress has been made, when looking at the overall big data stack, a major challenge for database research community still remains. That is, how to maintain the ease-of-use despite the increasing heterogeneity and complexity of data analytics, involving specialized engines for various aspects of an end-to-end data analytics pipeline, including, among others, graph-based, linear algebra-based, and relational-based algorithms, and the underlying, increasingly heterogeneous hardware and computing infrastructure.

At TU Berlin, DFKI, and the Berlin Institute for Foundations of Learning and Data (BIFOLD) we currently aim to advance research in this field via the NebulaStream and Agora projects. Our goal is to remedy some of the heterogeneity challenges that hamper developer productivity and limit the use of data science technologies to just the privileged few, who are coveted experts. In this talk, we will outline how state-of-the-art SPEs have to change to exploit the new capabilities of the IoT and showcase how we tackle IoT challenges in our own system, NebulaStream. We will also present our vision for Agora, an asset ecosystem that provides the technical infrastructure for offering and using data and algorithms, as well as physical infrastructure components.

Bio: Volker Markl is a German Professor of Computer Science. He leads the Chair of Database Systems and Information Management at TU Berlin and the Intelligent Analytics for Massive Data Research Department at DFKI. In addition, he is Director of the Berlin Institute for the Foundations of Learning and Data (BIFOLD). He is a database systems researcher, conducting research at the intersection of distributed systems, scalable data processing, and machine learning. Volker led the Stratosphere project, which resulted in the creation of Apache Flink.

Volker has received numerous honors and prestigious awards, including two ACM SIGMOD Research Highlight Awards and best paper awards at ACM SIGMOD, VLDB, ICDE, and EDBT. He was recognized as ACM Fellow for his contributions to query optimization, scalable data processing, and data programmability. He is a member of the Berlin-Brandenburg Academy of Sciences. In 2014, he was elected one of Germany's leading “Digital Minds“ (Digitale Köpfe) by the German Informatics Society. He also is a member of the Berlin-Brandenburg Academy of Sciences and serves as advisor to academic institutions, governmental organizations, and technology companies. Volker holds eighteen patents and has been co-founder and mentor to several startups.

Workshops

Organiser: Christian S. Jensen, Aalborg University

Invited technical talk by Volker Markl:
NebulaStream: Data Management for the Internet of Things

Organisers: Rasmus Pagh, University of Copenhagen & Rico Jacob, IT University of Copenhagen

11.30: Invitation to differential privacy
Boel Nelson and Rasmus Pagh, University of Copenhagen

12.00: Algorithmic Cheminformatics
Jakob Andersen, University of Southern Denmark

12.30:  A study on succinct data structures
Mingmou Liu, University of Copenhagen

Organiser: Susanne Bødker, Aarhus University

11.30 Presentation and status of people

12.00 Gaze and Eye Movement in Interaction
by Hans Gellersen, Aarhus University/Lancaster University

Eye movement and gaze are central to human interaction with the world. Our visual system not only enables us to perceive the world, but also provides exquisite control of the movements we make in the world. The eyes are at the heart of this, never still, and in constant interaction with other parts of our body to direct visual attention, extract information from the world, and guide how we navigate and manipulate our environment. Where we look implicitly reflects our goals and information needs, while we also able to explicitly direct our gaze to focus attention and express interest and intent. This makes gaze a formidable modality for human-computer interaction (HCI).

In this talk, I will highlight how closely the movement of our eyes is coupled with other movement, of objects in the visual field, as well movement of our hands, and our head and body, and discuss examples of novel interfaces that leverage eye movement in concert with other motion.

Hans Gellersen is Professor of Interactive Systems at Lancaster University and Aarhus University. His research background is in sensors and devices for ubiquitous computing and human-computer interaction and he has worked on systems that blend physical and digital interaction, methods that infer context and human activity, and techniques that facilitate spontaneous interaction across devices. Over the last ten years a main focus of his work has been on eye movement. In 2020, he was awarded an ERC Advanced Grant by the European Research Council for research on Gaze and Eye Movement in Interaction.

Organiser: Jan Madsen, Technical University of Denmark

11:30  The CPS ecosystem — status

11:45  Partner presentations covering; scientific focus, application domains, people, key projects (15 min each):

  • How to build a Digital Twin
    Mirgita Frasheri (AU)
  • Hardware/Software Trade-off for the Reduction of Energy Consumption, (Explore) Maja and Martin (RUC, DTU)
  • AAU?
  • ITU?
12:30 Identifying the Grand Challenges of CPS 

Organiser: Kim Guldstrand Larsen, Aalborg University

SESSION 1 Verification

BRIDGE: Verifiable and Safe AI for Autonomous Systems

Overview and Status
by Kim Guldstrand Larsen, Aalborg University

HOFOR Case and Strategy Representation
by Andreas Holck Høegh-Petersen, IT University of Copenhagen

Aarhus Vand Case and Reinforcement Learning
by Martijn Goorden (AAU)

TECHNICAL LIGHTENING TALKS

Verification of Dynamical Systems
by Max Tschaikowski, Aalborg University

Verification of Neural Network Control Systems
by Christian Schilling, Aalborg University

Formal Verification and Robust Machine Learning
by Alessandro Bruni, IT University of Copenhagen

EXPLORE: Verifiable and Robust AI FUTURE projects
ALL

EXPLORE: Certifiable Controller Synthesis for Cyber-Physical Systems FUTURE projects
Martijn Goorden (short status)

Organiser: Mads Nielsen, University of Copenhagen

11:30: Overview of DIREC and Pioneer Centre activities
Mads Nielsen, KU

11:40: EXPLAIN ME, Explainable AI for Medical Education
Aasa Feragen, DTU

12:00: HERD - Human-AI Collaboration: Engaging and Controlling Swarms of Robots and Drones
Anders Lyhne Christensen, SDU Maria-Theresa Oanh Hoang, AAU Kasper Andreas Rømer Grøntved, SDU

12.20: Trimming Data Sets: a Verified Algorithm for Robust Mean Estimation
Alessandro Bruni, IT University of Copenhagen

12.40: Privacy and Machine Learning
Peter Scholl, Aarhus University

Organiser: Claudio Orlandi, Aarhus University

Part I: Differential Privacy

(joint session with New Perspectives on Algorithms and Datastructures)

11.30 Invitation to differential privacy
Boel Nelson and Rasmus Pagh, University of Copenhagen

12.10 - Small break to change room

Part II: Security in AI
(joint session with AI – Machine Learning, Computer Vision, NLP)

12.20 Trimming Data Sets: a Verified Algorithm for Robust Mean Estimation
Alessandro Bruni, IT University of Copenhagen

12.40 Privacy and Machine Learning
Peter Scholl, Aarhus University

See abstracts

Workshops continued

Organiser: Christian S. Jensen, Aalborg University

Spatial Data Management
 
14.05 – 14.30: Efficient Data Management for Modern Spatial Applications and the Internet of Moving Things
by Eleni Tzirita Zacharatou (ITU)
 
14.30 – 14.55: Building a maritime traffic network for route optimization using AIS data
by Búgvi Benjamin Jónleifsson Magnussen & Nikolaj Blæser (RUC)
 
14.55 – 15.20: Big Mobility Data Analytics: Algorithms and Techniques for Efficient Trajectory Clustering
by Panagiotis Tampakis (SDU)

Organisers: Rasmus Pagh, University of Copenhagen & Rico Jacob, IT University of Copenhagen

14.00: Stochastic Games with Limited Memory Space
Kristoffer Hansen, Aarhus University

14.30: Recent Advances in I.I.D. Prophet Inequalities
Kevin Schewior, University of Southern Denmark

15.00: New algebraic formula lower bounds for Iterated Matrix Multiplication
Nutan Limaye, IT University of Copenhagen

Organiser: Susanne Bødker, Aarhus University

14.00 Rework – status, presentation and workshop

15.00 Wrap-up and a quick discussion of the Danish HCI research day

Organiser: Jan Madsen, Technical University of Denmark

14:00  Presentation of current WS6 DIREC projects (15 min each):

Biochip routing, (Explore)
Luca and Kasper (DTU)

Technologies for executing AI in the edge, (Bridge)
Emil and Ahmad (DTU, SDU)

Adaptive Neural Networks on Embedded Platforms,
Jalil Boudjadar (AU)

CPS with HITL, (Explore)
Mahyar Touchi Moghaddam (SDU)

Business Models for Embedded AI - Current case company business models and beyond
Reza and Ben (CBS)

15:15  Conclusions of the day

Organiser: Kim Guldstrand Larsen, Aalborg University

SESSION 2 Software Engineering

BRIDGE: SIOT – Secure Internet of Things – Risk analysis in design and operation
by Jaco van de Pol & Alberto Lafuente (short status)

EXPLORE: DeCoRe: Tools and Methods for the Design and Coordination of Reactive Hybrid Systems
by Thomas Hildebrandt (short status & technical talk)

TECHNICAL LIGHTENING TALKS

Lightweight verification of concurrent and distributed systems
by Alceste Scala (DTU)

Certified model checking – verifying the verifier
by Jaco van de Pol, Aarhus University

Refinement and compliance
by Hugo-Andrés López, Technical University of Denmark

Differential Testing of Pushdown Reachability with a Formally Verified Oracle
by Anders Schlichtkrull, Aalborg University

Monitoring of Timed Properties
by Kim G. Larsen, Aalborg University

Organiser: Mads Nielsen, University of Copenhagen

14:00 Large-scale Neuroimaging Study on a Danish Cohort: COVID-19, Brain Volume, and microbleeds
Kiril Klein, University of Copenhagen

14:15 Fetal Ultrasound scanning assistance
Manxi Lin, Technical University of Denmark

14:30 Inducing Gaussian Process Networks
Thomas Dyhre, Aalborg University

14:45 Bridge project: Deep Learning and Automation of Imaging-Based Quality of Seeds and Grains
Lars Kai Hansen, Technical University of Denmark

15:00 Fine-Grained Image Generation with Super-Resolution
Andreas Aakerberg & Thomas Moeslund, Aalborg University

15:15 Summary of workshop
Mads Nielsen, University of Copenhagen

Organiser: Claudio Orlandi, Aarhus University

14.00: Security Protocols as Choreographies
by Marco Carbone

14.20:  A formal security analysis of Blockchain voting
by Bas Spitters, Aarhus University

14.40: Challenges in anti-money laundering and how cryptography can help
by Tore Frederiksen, The Alexandra Institute

15.00: Networking

See abstracts

One Minute Madness
Presentation of DIREC projects following Q&A

Tuesday 27 September

Moderator: Jan Madsen, University of Southern Denmark

Abstract
Quantum computers have the potential to solve certain tasks that would take millennia to complete even with the fastest (conventional) supercomputer. Numerous quantum computing applications with a near-term perspective (e.g., for finance, chemistry, machine learning, optimization) and with a long-term perspective (i.e., cryptography, database search) are currently investigated. However, while impressive accomplishments can be observed in the physical realization of quantum computers, the development of automated methods and software tools that provide assistance in the design and realization of applications for those devices is at risk of not being able to keep up with this development anymore.

This may lead to a situation where we might have powerful quantum computers but hardly any proper means to actually use them. In this talk, we discuss how design automation can help to address this problem. This also includes an overview of corresponding software tools for quantum computers covering the simulation, compilation, and verification. More details here.

Bio
Robert Wille is a Full and Distinguished Professor at the Technical University of Munich, Germany, and Chief Scientific Officer at the Software Competence Center Hagenberg, Austria (a technology transfer company with 100 employees).

He received the Diploma and Dr.-Ing. degrees in Computer Science from the University of Bremen, Germany, in 2006 and 2009, respectively. Since then, he worked at the University of Bremen, the German Research Center for Artificial Intelligence (DFKI), the University of Applied Science of Bremen, the University of Potsdam, and the Technical University Dresden. From 2015 until 2022, he was Full Professor at the Johannes Kepler University Linz, Austria, until he moved to Munich.

His research interests are in the design of circuits and systems for both conventional and emerging technologies. In these areas, he published more than 400 papers and served in editorial boards as well as program committees of numerous journals/conferences such as TCAD, ASP-DAC, DAC, DATE, and ICCAD. For his research, he was awarded, e.g., with Best Paper Awards, e.g., at TCAD and ICCAD, an ERC Consolidator Grant, a Distinguished and a Lighthouse Professor appointment, a Google Research Award, and more.

Workshops

Organiser: Mark Riis, Technical University of Denmark

Collaboration on entrepreneurship across universities

  • Recap on WS 13 activities in 2021-2022 – activities, budget etc.
    Mark Riis, DTU Compute

  • Open Entrepreneurship – learnings from inviting investors into universities
    Rasmus S. B. Jensen, Open Entrepreneurship, DTU Compute

  • Young Researcher Entrepreneurship - results and experiences
    Camilla N. Jensen, AI Pioneer Centre, DTU Skylab

  • Digital Tech Summit - results and experiences Mark Riis, DTU Compute 

  • Discussion, learnings and knowledge sharing

Which joint activities should we initiate in 2022-2023
  • DIREC at Digital Tech Summit 2022
  • Other activities in relation to supporting entrepreneurship and collaboration across universities

Organiser: Mikkel Baun Kjærgaard, University of Southern Denmark

How to make computing education appeal to a broader range of students
by Claus Brabrand, IT University of Copenhagen

We present recent research on gender diversity in Computing. Recent research documents strong and significant gender effects related to the interests in working with PEOPLE vs THINGS along several dimensions. In particular, this relates to the themes of teaching/learning activities (i.e., the themes of exercises, projects, and examples), the framing of advertisement materials, and the composition of courses on educational programmes. We will explain these results and effects as well as give actionable evidence-based recommendations for how to make Computing educational activities & programmes appeal to a broader range of students.

How digital learning technology can provide insights on teaching quality of large classrooms
by Md Saifuddin Khalid, Department of Mathematics and Computer Science, Technical University of Denmark

Semester-end and mid-term online feedback are important information for both students and course instructors etc. Unfortunately, the teaching quality evaluation tools that are used at Danish universities are often time consuming and do not allow for self-reflection on teaching and learning, which can enable mutual understanding and collaboration between the students and course instructors. Join us at this workshop, where we will provide a tutorial and experience from two large courses adopting Wyblo. Wyblo is a people-centered learning experience platform which provides useful insights on teaching quality to both course instructors and students.

How to use technology to scale courses
by Jakob Lykke Andersen, Dept. of Mathematics and Computer Science, University of Southern Denmark and Ulrik Nyman, Dept. of Computer Science, Aalborg University

In this workshop we will discuss how to use software and infrastructure for scaling and improving quality of teaching in Computer Science. As inspiration for the discussion, we have two presentations:

Teaching 400 students to program in 16 weeks with 3 teachers and 17 teaching assistants
by Jon Sporring & Ken Friis Larsen, KU

At the Department of Computer Science, University of Copenhagen, we have recently upscaled our introduction to programming for our bachelor courses. In the last 5 years, we have grown from 200 to 400 students, and in the process, we have developed IT tools to both manage the growth and at the same time increase the learning quality. In this talk, we will discuss the pedagogical challenges, the resource challenges, the developed tools for helping the students self-learn and give the students structured feedback, and the lessons learned in the process.

Automatic feedback and correction of programming software assignments for scalable teaching
by Miguel Enrique Campusano Araya & Aisha Umair, SDU

In this talk, we present Scalable Teaching. This tool uses automatic testing to grade students’ programming assignments and provides feedback to them automatically. Moreover, Scalable Teaching allows professors to grade assignments and give feedback manually more efficiently. We have successfully tested this tool in several software engineering courses with more than 100 students.

Organiser: Thomas Hildebrandt, University of Copenhagen

Make your research visible and understood outside academia
by Peter Hyldgård, Sciencecom.dk

  • Be heard - and understood
  • Tell a good story about your research
  • Pitch your research
  • Talk about your research to non-peers (your Aunt Erna...)
How do you tell a simple story about your research that everyone can understand - without compromising on the academic content?

And how do you build a bridge to an audience that does not have any immediate interest in/knowledge of your topic?

The speaker will introduce a number of simple tools for finding a story about your research, which can be uses in many contexts: When you have to seek funding, when you are interviewed by a journalist - or when you must tell your Uncle Adam about your work. The workshop will be a mixture of presentations and small exercises, with a slightly larger final exercise where the participants will give a - very short - oral 'pitch' of their research.

Organiser: Helle Zinner Henriksen, Copenhagen Business School

End of the Rainbow

In this session we will discuss how technical solutions and ideas from some of  the DIREC projects can be diffused to a wider context, supporting innovation and impact.

Session speakers:

  • Geet Khosla, Tech entrepreneur with particular focus on leveraging technologies with massive potential to have a positive impact.
  • Martin Møller, Chief Scientific Officer at the Alexandra Institute
  • Peter Gorm Larsen, Professor at Department of Electrical and Computer Engineering - Software Engineering & Computing systems at AU
  • Ben Eaton, Associate professor at Department of Digitalization at CBS.
The session focuses on the business potential and evolves around the question “How to harvest spill-over benefits from foundational tech research?”

Inspired by the session speakers’ input the audience is invited to contribute to the session in the discussion of potential avenues to address the question. The aim is to illustrate the benefit of addressing tech and business.

Organiser: Jan Madsen, Technical University of Denmark

10:00 – 12:00 Tutorial:


• Basic concepts
• Models of computations
• Use cases / Applications
• Tools + Integration to host
 
Speakers:


12:00  - 12:30  Open discussion on opportunities for and in DIREC
Discussion leader: Sven Karlsson, DTU Compute

How can data accelerate the green transformation?

Hierarchical forecast reconciliation
by Jan Kloppenborg Møller, DTU Compute

A unique collaboration between a university and a private company

In 2019, the Swiss non-profit Concordium Foundation founded the Concordium Blockchain Research Centre Aarhus at Aarhus University (AU). This is a unique example of collaboration between a university and a company where the company sponsors the research carried out at the university with a substantial amount of money.

In this session Associate Professor Bas Spitters from Aarhus University and Senior Researcher Daniel Tschudi from Concordium will share their experiences from the collaboration and comment on issues like:

  • What is collaboration about?
  • What is the current status?
  • What are the future challenges?
  • How did the collaboration start?
  • What do the researchers get out of it?
  • What does Concordium out of it?
  • How does Concordium embed/anchor research activities within Concordium?
  • How  does the collaboration work out in practice?
  • How do one handle the borderline between research to be carried out in the center and development to be carried out in the company?
  • What is their advice to researchers regarding similar collaborations?

About Concordium Blockchain Research Centre Aarhus

The research center is to provide the basic research needed to build energy-efficient and scalable blockchain technology that is provably secure. Along the way, it is expected that a lot of discoveries in the blockchain space and related sciences that we cannot anticipate at the onset.

About the Swiss non-profit Concordium Foundation

The mission is to fund research in the blockchain space, and build a new foundational blockchain with focus on business and regulatory compliance. The center performs free, basic research in the theory and technology underlying blockchains. All research performed in the center is open source and patent free and will help build a solid foundation for the entire blockchain space.

  • Lars Bak, Former Head of Google's Development Dept. in Denmark,
  • Steffen Grarup, Uber
  • Kresten Krab Thorup, Founder of Humio
Moderator: Professor Ole Lehrmann Madsen

Lars, Steffen and Kresten are all graduates from department of computer science at Aarhus University. They have all made an impressive careers with high tech comp companies in Silicon Valley and Denmark. These companies include Next Computer, Sun Micro Systems, VMware, Google, and Uber. They have also been involved in a number of start-ups including Animorphic Systems, OOVM, Toitware, Trifork and Humio. These endeavors have resulted in development of a large palette of new innovative digital technologies.

In the panel they will tell us about their experience and highlight the most important lessons from their careers including their life as computer science students. We will ask them about their advice to students and young candidates of today regarding how to get an interesting carrier working with ground-breaking digital technologies and getting them out in successful products.

Speakers

Marieke Huisman

Professor in Software Reliability
University of Twente

Volker Markl

Professor of Computer Science
Technische Universität Berlin

Robert Wille

Professor
Technical University of Munich

Lars Bak

Former Head of Google's division in Denmark

Steffen Grarup

Senior Director Engineering
Uber Technologies

Kresten Krab Thorup

Founder and former CTO
Humio

Daniel Tschudi

Senior Researcher
Concordium

Bas Spitters

Associate Professor, Aarhus University

Categories
SciTech project

Online Algorithms with Predictions

Project type: SCITECH Project

Online Algorithms with Predictions

All industrial sectors face optimization problems, and usually many of them, i.e., situations where one must optimize wrt. some resource. This could be minimizing material usage or it could be optimizing towards time or space consumption. Examples include cutting shapes from expensive material, packing containers best possibly to minimize transportation costs, or scheduling routes or dependent tasks to finish earliest possible.

In some cases, all information is available when the processing of tasks commences, but in many situations, tasks arrive during the process, and decisions regarding their treatment must be made shortly after their arrival before further tasks appear. Such problems are referred to as “online”. Obviously, online problems lead to poorer solutions than one can obtain with their offline counterparts, unless fairly precise, additional information about the future tasks is available. In designing and analyzing algorithms, in general, the goal is to determine the quality of an algorithmic solution, preferably with guarantees on performance for all inputs, so that it is possible to promise delivery times or bounds on expenses, etc. Such an analysis also allows the designer to determine if it would be beneficial to search for other algorithmic solutions. Assessing the quality of the algorithms experimentally suffers from the difficulty of determining which inputs to test on and providing trustworthy worst-case bounds.

The area of online algorithms has existed for many years and provides analyses giving worst-case guarantees. However, since these guarantees hold for all inputs, even the most extreme and, sometimes, unrealistic, these guarantees are very pessimistic and often not suited for choosing good algorithms for the typical cases. Thus, in practice, companies often use techniques based on heuristic methods, machine learning, etc. Machine learning, especially, has proven very successful in many applications in providing solutions that are good in practice, when presented with typical inputs. However, on input not captured by training data, the algorithm may fail dramatically.

We need combinations of the desirable properties of guarantees from the online algorithms world and of the experienced good behavior on typical input from, for instance, the machine learning world. That is, we need algorithms that follow predictions given from a machine learning component, for instance, since that often gives good results, but it should not do so blindly or the worst-case behavior will generally be even worse than the guarantees provided by standard online algorithms and their analyses. Thus, a controlling algorithmic unit should monitor the predictions that are given so that safety decisions can overrule the predictions when things are progressing in a worrisome direction.

We also need ways of quantifying the guaranteed quality of our solutions as a function of how closely an input resembles the predicted (by a machine learning component, for instance) input. This is a crucial part of risk management. We want reassurance that we do not “fall off the cliff” just because predictions are slightly off. This includes limiting the ”damage” possible from machine learning adversarial attacks. As an integral part of a successful approach to this problem, we need measures developed to quantify an input’s distance from the prediction (the prediction error) that are defined in such a manner that quality can be expressed as a function of the prediction error. For online algorithm applications, this often needs to be different from standard loss functions for machine learning.

Our main aim is to further the development of generally-applicable techniques for utilizing usually good, but untrusted predictions, while at the same time providing worst-case guarantees, in the realm of online optimization problems. We want to further establish this research topic at Danish universities and subsequently disseminate knowledge of this to industry via joint collaboration. Developments of this nature are of course considered internationally. Progress is to a large extent made by considering carefully chosen concrete problems, their modeling and properties, and extract general techniques from those studies, and further test their applicability on new problems.

We are planning to initiate work on online call control and scheduling with precedence constraints. The rationale is that these problems are important in their own right and at the same type represent different types of challenges. Call control focuses on admitting as many requests as possible with limited bandwidth, whereas scheduling focuses on time, handling all requests as effectively as possible.

Call control can be seen as point-to-point requests in a network with limited capacity. The goal is to accept as profitable a collection of requests as possible. Scheduling deals with jobs of different duration that must be executed on some “machine” (not necessarily a computer), respecting some contraints that some jobs cannot be executed before certain other jobs are completed. In this problem, all jobs must be scheduled on some machine, and the target is to complete all jobs as fast as possible. To fully define these problems more details are required about the structure of the resources and the precise optimization goals.

Some generic insight we would like to gain and which is sorely lacking in the community currently is formalizable conditions for good predictions. We want performance of algorithms to degrade gracefully with prediction errors. This is important for the explainability and trustworthiness of algorithms. Related to this, whereas some predictions may be easy to work with theoretically, it is important to focus on classes of predictions that are learnable in practice. To be useful, this also requires robustness, in the sense that minor, inconsequential changes in the input sequence compared with the prediction should not affect the result dramtically.

We are also interested in giving minor consideration to impossibility results, i.e., proving limits on how good solutions can be obtained. Whereas this is not directly constructive, it can tell us if we are done or how close we are to an optimal algorithm, so we do not waste time trying to improve algorithms that cannot be improved or only improved marginally.

The project leads to value creation in a number of different directions.

Research-wise, with the developments in machine learning and related data science disciplines over the last years, the integration and utilization of these techniques into other areas of computer science is of great interest, and Danish research should be at the forefront of these endeavors. We facilitate this by bringing people with expertise in different topics together and consolidating knowledge of the primary techniques across institutions. Educating students in these topics is usually a nice side-effect of running such a project. The primary focus, of course, is to educate the PhD student and train the research assistants, but this is accompanied by having MS students working on their theses during the project period solve related, well-defined subproblems.

We are advocating the combined techniques that strive towards excellent typical-case performance while providing worst-case guarantees, and believe that they should be adopted by industry to a larger extent. The project will lead to results on concrete problems, but our experience tells us that companies generally need variations of these or new solutions to somewhat different problems. Thus, the most important aspect in this regards is capacity building, so that we can assist with concrete developments for particular companyspecific problems. Other than the fact that problems appear in many variations in different companies, a main reason why problem adaption would often be necessary is that the added value of the combined algorithmic approaches is based on predictions. And it varies greatly what type of data is obtainable and which subset of the data can give useful predictions.

We have prior experience with industry consulting, the industrial PhD program, and co-advised MS students, and maintain close relationships with local industry. After, and in principle also during, this project, we are open to subsequent joint projects with industry that take their challenges as the starting point, whereafter we utilize the know-how and experience gained from the current project. Work such as that could be on a consultancy basis, through joint student project, or, at a larger scale, with, for instance, the Innovation Foundation as a partner.

Finally, we see it as an advantage in our project that we include researchers that are relatively new to Denmark such that they get to interact with more people at different institutions and expand their Danish network.

September 1, 2022 – August 31, 2025 – 3 years.

Total budget DKK 3,5 / DIREC investment DKK 1,5

Participants

Project Manager

Kim Skak Larsen

Professor

University of Southern Denmark
Department of Mathematics and Computer Science

E: kslarsen@imada.sdu.dk

Nutan Limaye

Associate Professor

IT University of Copenhagen
Department of Computer Science

Joan Boyar

Professor

University of Southern Denmark
Department of Mathematics and Computer Science

Melih Kandemir

Associate Professor

University of Southern Denmark
Department of Mathematics and Computer Science

Lene Monrad Favholdt

Associate Professor

University of Southern Denmark
Department of Mathematics and Computer Science

Partners

Categories
SciTech project

Benefit and Bias of Approximate Nearest Neighbor Search for Machine Learning and Data Mining

Project type: SCITECH Project

Benefit and Bias of Approximate Nearest Neighbor Search for Machine Learning and Data Mining

The search for nearest neighbors is a crucial ingredient in many applications such as density estimation, clustering, classification, and outlier detection. Often, neighborhood search is also the bottleneck in terms of efficiency in these applications. In the age of big data, companies and organizations can usually store billions of individual data points and embed these data points into a high-dimensional vector space. For example, the Danish company Pufin ID uses nearest neighbor search to link chemical labels placed on physical objects to a digital hash code. They require answers in milliseconds for such neighbor searches among nearly a billion high-dimensional vectors. Due to the curse of dimensionality, using traditional, exact nearest neighbor search algorithms are the bottleneck of such applications, which can take minutes or hours to answer a single query.

To solve such scalability challenges, more and more approximate nearest neighbor (ANN) search methods are employed. Depending on the data structure, the word “approximate” can both mean a strong theoretical guarantee or more loosely that results are expected to be inexact. Many applications of ANN based methods have profound societal influence on algorithmic decision-making processes. If a user sees a stream of personalized, recommended articles or a “curated” version of the timeline, the need for efficient processing makes it often necessary that these results are based on the selection of approximate nearest neighbors in an intermediate step. Thus, the bias, benefits, or dangers of such a selection process must be studied.

According to standard benchmarks, approximate methods can process queries several orders of magnitude faster than exact approaches, if results do not need to be close to exact. A downstream application of nearest neighbor search must take
the inexact nature of the results into account. Different paradigms might come with different biases, and some paradigms might be more suitable for a certain use case. For example, recent work suggests that some ANN methods exhibit an “all or nothing” behavior, which causes the found neighbors to be completely unrelated. This can evaporate the trust of a user in the application. On the other hand, there exists work that suggests that ANN can improve the results of a downstream application, for example in the context of ensemble learning for outlier detection.

This project aims to use approximate nearest neighbor search to design highly scalable and robust algorithms for diverse tasks such as clustering, classification, and outlier detection.

Hypothesis
Many applications in machine learning and data mining can be sped up using approximate nearest neighbor search with no or only a negligible loss in result quality for the application, compared to an exact search. Different methods for ANN search come with different biases that can be positive or negative with varying degrees for the downstream application. In this project, the bias of different ANN methods and its impact on different applications will be studied from a fundamental and from an empirical level. We strive to address the following problems:

Theme 1
ANN for Discrimination Discovery and Diversity Maximization. Discrimination discovery and diversity maximization are central elements in the area of algorithmic fairness. Traditional classifiers include k-NN classifiers that scale poorly to high-dimensional data. On the other hand, diversity maximization usually involves a diversification of nearest neighbor search results.

Goals: Study the effect of ANN results on the quality of the k-NN classifier for discrimination discovery. Theoretically develop LSH-based diversity maximization methods that build the diversification into the LSH, and empirically evaluate it against other known approaches.

Theme 2
ANN for Outlier Detection. How do different ANN paradigms influence the quality of an outlier detection (OD) algorithm? Can outlier classification be “built into” an ANN algorithm to further scale up the performance?

Goals: Develop a theoretically sound LSH based outlier detection algorithm with provable guarantees; empirically compare the performance of different ANN-based OD classifiers; design and evaluate the performance of using different classifiers in an ensemble.

Theme 3
ANN for Clustering. Density-based and traditional clustering approaches rely on a nearest neighbor search or a range search to cluster data points. What is the effect of finding approximate neighbors? How well can we adopt different ANN paradigms to support range search operations? Related work uses LSH as a black-box: Can we use the LSH bucket structure to directly implement DBSCAN?

Goals: Extend graph-based ANN algorithms to support range-search primitives. Implement DBSCAN-based variants and evaluate their performance and quality.

Theme 4
ANN to Speed-Up Machine Learning Training. Many training tasks in machine learning are costly. However, steps such as backpropagation boil down to a maximum inner product search (MIPS), for which we know that ANN provide efficient approximate solutions. In this task, we will study whether we can achieve comparable or better performance using ANN in the backpropagation step. Will the bias hurt the classification results or improve robustness?

Goals: Develop and evaluate neural network training using different ANN-based approaches to MIPS.

Risk Management
The research themes mentioned above can mostly be carried out independently from each other. The actual downstream application is relatively flexible, which lowers the risk of the project failing. The hiring process will make sure that the prospective PhD student has both a theoretical understanding of algorithms and data mining and practical experience with programming. As a fallback if theoretical results turn out to not be in reach, the empirical results will improve the state-of-the-art and imply strong results for venues with an empirical focus, yield demonstrations at such venues, and result in open-source software to make the methods available to a broad audience.

Scientific value
The scientific value of the project is a fundamental understanding of the influence of approximate nearest neighbor search on applications in machine learning and data mining, such as outlier detection, clustering and algorithmic decision making. Through this project, we will propose new algorithmic methods and provide efficient implementations to solve important machine learning and data mining tasks. In the spirit of open science and to maximize impact of the scientific results, all software resulting from the project will be made available open source. As a long-term goal, our results will show that when handled with care in the design and rigor in the analysis, approximate methods allow the design of scalable algorithms that do not necessarily lose in quality. Of course, this might not only be true areas covered in this project, but many others where exact solutions are computationally out of reach.

Capacity building
In terms of capacity building the value of the project is to educate a PhD student. Such a student will be able to work both on a theoretical and an applied level. She will also be trained in critical thinking on algorithmic decision making, which is a highly valuable skill for society. In addition, the project will offer several affiliated student projects on a Bachelor’s and Master’s level, and the availability of the research results will make it easy for others to build upon the work. The long-term goal of this project is to attract the interest of companies to use these methods and develop them further, aiming for follow-up projects with industry partners on a larger scale.

Societal value
The rise of vector embedding methods for text, images, and video had a deep impact on society. Many of its applications such as personalized recommendations or curated news feeds are taken for granted, but are only made possible through efficient search methods. Thus, ANN-based methods allowed us to design algorithmic decision-making processes with profound influence on our everyday life. If a user sees a stream of personalized, recommended articles or a “curated” version of their social media feed, it is very likely that these results are based on the selection of approximate nearest neighbors in an intermediate step. The bias, benefits, and dangers of such a selection process must be studied carefully. Moreover, a successful application of approximate techniques has the potential for liberating the use of methods such as deep learning by lowering the entry cost in terms of hardware. This is for example show-cased by the recently founded start-up ThirdAI.

August 2022 – December 31, 2025 – 3,5 years.

Total budget DKK 3,5 / DIREC investment DKK 1,77

Participants

Project Manager

Martin Aumüller

Associate Professor

IT University of Copenhagen
Department of Computer Science

E: maau@itu.dk

Project Manager

Arthur Zimek

Professor

University of Southern Denmark
Department of Mathematics and Computer Science

E: zimek@imada.sdu.dk

Partners

Categories
Bridge project

Low-Code Programming of Spatial Contexts for Logistic Tasks in Mobile Robotics

Project type: Bridge Project

Low-Code Programming of Spatial Contexts for Logistic Tasks in Mobile Robotics

An unmet need in industry is flexibility and adaptability of manufacturing processes in low-volume production. Low-volume production represents a large share of the Danish manufacturing industry. Existing solutions for automating industrial logistics tasks include combinations of automated storage, conveyor belts, and mobile robots with special loading and unloading docks. However, these solutions require major investments and are not cost efficient for low-volume production.

Therefore, low-volume production is today labor intensive, as automation technology and software are not yet cost effective for such production scenarios where a machine can be operated by untrained personnel. The need for flexibility, ease of programming, and fast adaptability of manufacturing processes is recognized in both Europe and USA. EuRobotics highlights the need for systems that can be easily re-programmed without the use of skilled system configuration personnel. Furthermore, the American roadmap for robotics  highlights adaptable and reconfigurable assembly and manipulation as an important capability for manufacturing.

The company Enabled Robotics (ER) aims to provide easy programming as an integral part of their products. Their mobile manipulator ER-FLEX consists of a robot arm and a mobile platform. The ER-FLEX mobile collaborative robot provides an opportunity to automate logistic tasks in low-volume production. This includes manipulation of objects in production in a less invasive and more cost-efficient way, reusing existing machinery and traditional storage racks. However, this setting also challenges the robots due to the variability in rack locations, shelf locations, box types, object types, and drop off points.

Today the ER-FLEX can be programmed by means of block-based features, which can be configured to high-level robot behaviors. While this approach offers an easier programming experience, the operator must still have a good knowledge of robotics and programming to define the desired behavior. In order to enable the product to be accessible to a wider audience of users in low-volume production companies, robot behavior programming has to be defined in a simpler and intuitive manner. In addition, a solution is needed that address the variability in a time-efficient and adaptive way to program the 3D spatial context.

Low-code software development is an emerging research topic in software engineering. Research in this area has investigated the development of software platforms that allow non-technical people to develop fully functional application software without having to make use of a general-purpose programming language. The scope of most low-code development platforms, however, has been limited to create software-only solutions for business processes automation of low-to-moderate complexity.

Programming of robot tasks still relies on dedicated personnel with special training. In recent years, the emergence of digital twins, block-based programming languages, and collaborative robots that can be programmed by demonstration, has made a breakthrough in this field. However, existing solutions still lack the ability to address variability for programming logistics and manipulation tasks in an everchanging environment.

Current low-code development platforms do not support robotic systems. The extensive use of hardware components and sensorial data in robotics makes it challenging to translate low-level manipulations into a high-level language that is understandable for non-programmers. In this project we will tackle this by constraining the problem focusing on the spatial dimension and by using machine learning for adaptability. Therefore, the first research question we want to investigate in this project is whether and how the low-code development paradigm can support robot programming of spatial logistic task in indoor environments. The second research question will address how to apply ML-based methods for remapping between high-level instructions and the physical world to derive and execute new task-specific robot manipulation and logistic actions.

Therefore, the overall aim of this project is to investigate the use of low-code development for adaptive and re-configurable robot programming of logistic tasks. Through a case study proposed by ER, the project builds on SDU’s previous work on domain-specific languages (DSLs) to propose a solution for high-level programming of the 3D spatial context in natural language and work on using machine learning for adaptable programming of robotic skills. RUC will participate in the project with interaction competences to optimize the usability of the approach.

Our research methodology to solve this problem is oriented towards design science, which provides a concrete framework for dynamic validation in an industrial setting. For the problem investigation, we are planning a systematic literature review around existing solutions to address the issues of 3D space mapping and variability of logistic tasks. For the design and implementation, we will first address the requirement of building a spatial representation of the task conditions and the environment using external sensors, which will give us a map for deploying the ER platform. Furthermore, to minimizing the input that the users need to provide to link the programming parameters to the physical world we will investigate and apply sensor-based user interface technologies and machine learning. The designed solutions will be combined into the low-code development platform that will allow for the high-level robot programming.

Finally, for validation the resultant low-code development platform will be tested for logistics-manipulation tasks with the industry partner Enabled Robotics, both at a mockup test setup which will be established in the SDU I4.0 lab and at a customer site with increasing difficulty in terms of variability.

Making it easier to program robotic solutions enables both new users of the technology and new use cases. This contributes to the DIREC’s long-term goal of building up research capacity as this project focuses on building the competences necessary to address challenges within software engineering, cyber-physical systems (robotics), interaction design, and machine learning.

Scientific value
The project’s scientific value is to develop new methods and techniques for low-code programming of robotic systems with novel user interface technologies and machine learning approaches to address variability. This addresses the lack of approaches for low-code development of robotic skills for logistic tasks. We expect to publish at least four high-quality research articles and to demonstrate the potential of the developed technologies in concrete real-world applications.

Capacity building
The project will build and strengthen the research capacity in Denmark directly through the education of one PhD candidate, and through the collaboration between researchers, domain experts, and end-users that will lead to R&D growth in the industrial sector. In particular, research competences in the intersection of software engineering and robotics to support the digital foundation for this sector.

Societal and business value
The project will create societal and business value by providing new solutions for programming robotic systems. A 2020 market report predicts that the market for autonomous mobile robots will grow from 310M DKK in 2021 to 3,327M DKK in 2024 with inquiries from segments such as the semiconductor manufacturers, automotive, automotive suppliers, pharma, and manufacturing in general. ER wants to tap into these market opportunities by providing an efficient and flexible solution for internal logistics. ER would like to position its solution with benefits such as making logistics smoother and programmable by a wide customer base while alleviating problems with shortage of labor. This project enables ER to improve their product in regard to key parameters. The project will provide significant societal value and directly contribute to SDGs 9 (Build resilient infrastructure, promote inclusive and sustainable industrialization, and foster innovation).

In conclusion, the project will provide a strong contribution to the digital foundation for robotics based on software competences and support Denmark being a digital frontrunner in this area.

September 1, 2022 – December 31, 2025 – 3,5 years.

Total budget DKK 7,15 million / DIREC investment DKK 1,97 million

Participants

Project Manager

Thiago Rocha Silva

Associate Professor

University of Southern Denmark
Maersk Mc-Kinney Moller Institute

E: trsi@mmmi.sdu.dk

Aljaz Kramberger

Associate Professor

University of Southern Denmark
Maersk Mc-Kinney Moller Institute

Mikkel Baun Kjærgaard

Professor

University of Southern Denmark
Maersk Mc-Kinney Moller Institute

Mads Hobye

Associate Professor

Roskilde University
Department of People and Technology

Lars Peter Ellekilde

Chief Executive Officer

Enabled Robotics ApS

Partners

Categories
Bridge project

Privacy-Preserving and Software-Independent Voting Protocols

Project type: Bridge Project

Privacy-Preserving and Software-Independent Voting Protocols

Here are five considerations that explain the unmet needs of this proposed project.

  1. Voting protocols, both in form of Voting Governance Protocols and Internet Voting Protocols have become increasingly popular and will be more widely deployed, as a result of an ongoing digitalization effort of democratic processes and also driven by the current pandemic.
  2. Elections are based on trust, which means that election systems ideally should be based on algorithms and data structures that are already trusted. Blockchains provide such a technology. They provide a trusted bulletin board, which can be used as part of voting.
  3. Voting crucially depends on establishing the identity of the voter to avoid fraud and to establish eligibility veri ability.
  4. Any implementation created by a programmer, be it a Voting Governance Protocol or an Internet Voting Protocol can have bugs that quickly erode public confidence. Proof assistants are established tools that help to avoid large classes of common programming mistakes.
  5. Greenland laws were recently changed to allow for Internet Voting.

Having said all of this, tackling these unmet needs is a real research challenge. Decades of research in voting protocols have shown how diffcult it is to combine the privacy of the vote with the auditability of the election outcome. It is easy to achieve one without the other, but hard to combine both into one protocol. Thus, the topic of this proposed research proposal is to study voting protocols that are privacy-preserving and software-independent in the sense of Rivest and Wack’s definition. “A voting system is software-independent if an undetected change or error in its software cannot cause an undetectable change or error in an election outcome.” No such protocol is known to exist for online voting. In future work, we expect to apply the knowledge gained of this proposed research project more broadly to other security protocols.

The proposed research project aims to shed more light on the overall research question, if and what role blockchain technologies should play in the design of software-independent Voting Governance Protocols and Internet Voting Protocols in theory and practice. Affirming this research question in the positive would lead to a new generation of voting protocols that derive trust in the election outcome from trust in the blockchain, they would increase public con dence in the proper treatment of voter eligibility, and they would deliver technology for post-conflict and developing countries, where the population has little trust in paper evidence. This would trigger further innovation in the Voting Governance Protocol and Internet Voting markets. To answer this research question, we structure the research into two research objectives, which we elaborate on next.

  • (RO1) Explore the notion of software-independence, verifiability, and accountability in the context of blockchain voting protocols.
  • (RO2) Mapping the concept of vote privacy to the privacy-preservation in blockchains and how to scale this to a formally-verified and software-independent voting protocol.

Research methodology. In order to achieve (RO1), we will consider two theories of what constitutes software-independence. There is the game-theoretic view, which, similar to proof by reduction and simulation in cryptography, reduces software-independence of one protocol to another. The genesis protocol that was originally advocated by Rivest bases trust entirely on paper evidence, but there are alternatives, based on digital evidence, testing, and statistics. We plan to understand what software-independence actually means for blockchain voting protocols. We plan to consider these formal models of software-independence that we plan to study using proof assistants, to give even stronger software-independence guarantees. For all voting protocols that we design within this project, we will develop formal proofs of software independence, verifiability, and accountability.

To achieve (RO2), we start from the assumption that the blockchain provides sufficient privacy guarantees. We piggy-bag on blockchains that have a clear formal definition of what is meant by privacy, and that are mechanically proven correct. Based on results, we will then reconsider the designs of existing voting protocols, and design new voting protocols by choose-pick the best elements with the goal to achieve a software-independent protocol. A formal de nition of software-independence and a mechanized proof of correctness will be done in this work-package. Time permitting, we will extend our notion of software independence to other guarantees, including receipt freeness, coercion mitigation, and dispute resolution.

For a secure implementation, one needs to make sure that the deployed code correctly implements the protocol. We aim to automatically extract an executable verified smart contract from the formal model developed. The Concordium blockchain provides a secure and private way to put credentials, such as passport information, on the internet. We will investigate how to reuse such blockchain based identities for voting. Based on the results, we propose to develop an open-source library that makes our verified blockchain voting technology available for use in third-party products. We envision to release a product similar to Election Guard (which is provided by Microsoft), but with a blockchain functioning as a public bulletin board.

Scientific value
Internet voting provides a unique collection of challenges, such as, for example, vote privacy, software independence, receipt freeness, coercion resistance, and dispute resolution. Subsets of them can be solved separately, here we aim to guarantee vote privacy and software independence by the means of a privacy-preserving and accountable blockchain and formally verify the resulting voting protocol. The resulting voting protocol will be di erent from the existing ones, because they build on formally verified properties that are guaranteed by the choice of blockchain.

Capacity building
The proposed project pursues two kinds of capacity building. First, by training the PhD student and university students affiliated with the project, making Denmark a leading place for secure Internet voting. Second, if successful, the results of the project will contribute to the Greenland voting project and to international capacity building in the sense that they will strengthen democratic institutions.

Business value
The project is highly interesting to and relevant for the industry. There are two reasons why it is interesting for Concordium. On the one hand voting is an excellent application demonstrating the vision of the blockchain, and on the other hand Concordium will as part of the project implement a voting scheme to be used for decentralized governance of the blockchain. More precisely, the Concordium blockchain is designed to support applications where users can act privately while maintaining accountability and meeting regulatory requirements. Furthermore, it is an explicit goal of Concordium to support formally verified smart contracts. Obviously all these goals fit nicely with the proposed project, and it will be important for Concordium to demonstrate that the blockchain actually supports the secure voting schemes developed in the project. With respect to governance, Concordium has a need to develop a strong voting scheme allowing members of our community to vote on proposed features and to elect members of the Governance Committee. The project is of great interest to the Alexandra Institute to apply and improve in-house capacity for implementing cryptographic algorithms. The involvement of Alexandra will guarantee that the theoretical findings of the proposed project will we translated into usable real world products and disseminated further to Internet Voting providers that may eventually provide a voting solution to Greenland.

Societal value
Some nations are rethinking their respective electoral processes and they ways they hold elections. Since the start of the pandemic, approximately a third of all nations scheduled to hold a national election, have postponed them. It is therefore not surprising that countries are exploring Internet Voting as an additional voting channel. The result of this project would contribute to making Internet election more credible, and therefore strengthen developing and post-conflict democracies around the world. The election commission in Greenland, a partner in this proposed project, is currently actively pursuing the development and deployment of an Internet Voting system.

January 1, 2023 – December 31, 2025 – 3 years.

Total budget DKK 12,09 million / DIREC investment DKK 3,6 million

Participants

Project Manager

Carsten Schürmann

Professor

IT University of Copenhagen
Department of Computer Science

E: carsten@itu.dk

Bas Spitters

Associate Professor

Aarhus University
Department of Computer Science

Gert Læssøe Mikkelsen

Head of Security Lab

The Alexandra Institute

Kåre Kjelstrøm

Chief Technology Officer

Concordium ApS

Klaus Georg Hansen

Head of Division

Government of Greenland

Bernardo David

Associate Professor

IT University of Copenhagen

Diego Aranha

Associate Professor

Aarhus University
Department of Computer Science

Tore Kasper Frederiksen

Senior Cryptography Engineer

The Alexandra Institute

Ron Rivest

Professor

MIT

Philip Stark

Professor

University of California, Berkeley

Peter Ryan

Professor, Dr.

University of Luxembourg

Partners

Categories
Bridge project

Multimodal Data Processing of Earth Observation Data

Project type: Bridge Project

Multimodal Data Processing of Earth Observation Data

The Danish partnership for digitalization has concluded that there is a need to support the digital acceleration of the green transition. This includes strengthening efforts to establish a stronger data foundation for environmental data. Based on observations of the Earth a range of Danish public organizations build and maintain important data foundations. Such foundations are used for decision making, e.g., for executing environmental law or making planning decisions in both private and public organizations in Denmark.

The increasing possibilities of automated data collection and processing can decrease the cost of creating and maintaining such data foundations and provide service improvements to provide more accurate and rich information. To realize such benefits, public organizations need to be able to utilize the new data sources that become available, e.g., to automize manual data curation tasks and increase the accuracy and richness of data. However, the organizations are challenged by the available methods ability to efficiently combine the different sources of data for their use cases. This is particularly the case when user-facing tools must be constructed on top of the data foundation. The availability of better data for end-users will among others help the user decrease the cost of executing environmental law and making planning decisions. In addition, the ability of public data sources to provide more value to end-users, improves the societal return-on-investment for publishing these data, which is in the interest of the public data providers as well as their end-users and the society at large.

The Danish Environmental Protection Agency (EPA) has the option to receive data from many data sources but today does not utilize this because today’s lack of infrastructure makes it cost prohibitive to take advantage of the data. Therefore, they are expressing a need for methods to enable a data hub that provide data products combining satellite, orthophoto and IoT data. The Danish GeoData Agency (GDA) collects very large quantities of Automatic Identification System (AIS) data from ships sailing in Denmark. However, they are only to a very limited degree using this data today. The GDA has a need for methods to enable a data hub that combines multiple sources of ship-based data including AIS data, ocean observation data (sea level and sea temperature) and metrological data. There is a need for analytics on top that can provide services for estimating travel-time at sea or finding the most fuel-efficient routes. This includes estimating the potential of lowering CO2 emissions at sea by following efficient routes.

Geo supports professional users in performing analysis of subsurface conditions based on their own extensive data, gathered from tens of thousands of geotechnical and environmental drilling operations, and on public sources. They deliver a professional software tool that presents this multi modal data in novel ways and are actively working on creating an educational platform giving high school students access to the same data. Geo has an interest in and need for methods for adding live, multi modal data to their platform, to support both professional decision makers and students. Furthermore, they have a need for novel new ways of querying and representing such data, to make it accessible to professionals and students alike. Creating a testbed for combining Geo’s data with satellite feeds, combined with automated processing to interpret this data, will create new synergies and has the potential to greatly improve the visualizations of the subsurface by building detailed, regional and national 3D voxel models.

Therefore, the key challenges that this project will address are how to construct scalable data warehouses for Earth observation data, how to design systems for combining and enriching multimodal data at scale and how to design user-oriented data interfaces and analytics to support domain experts. Thereby, helping the organizations to produce better data for the benefit of the green transition of the Danish society.

The aim of the project is to do use-inspired basic research on methods for multimodal processing of Earth observation data. The research will cover the areas of advanced and efficient big data management, software engineering, Internet of Things and machine learning. The project will research in these areas in the context of three domain cases with GDA on sea data and EPA/GEO on environmental data.

Scalable data warehousing is the key challenge that work within advanced and efficient big data management will address. The primary research question is how to build a data warehouse with billions of rows of all relevant domain data. AIS data from GDA will be studied and in addition to storage also data cleaning will be addressed. On top of the data warehouse, machine learning algorithms must be enabled to compute the fastest and most fuel-efficient route between two arbitrary destinations.

Processing pipelines for multimodal data processing is the key topic for work within software engineering, Internet of Things and machine learning. The primary research question is how to engineer data processing pipelines that allows for enriching data through processes of transformation and combination. In the EPA case there is a need for enriching data by combining data sources, both from multiple sources (e.g., satellite and drone) and modality (e.g., the NDVI index for quantifying vegetation greenness is a function over a green and a near infrared band). Furthermore, we will research methods for easing the process of bringing disparate data into a form that can be inspected both by a human and an AI user. For example, data sources are automatically cropped to a polygon representing a given area of interest (such as a city, municipality or country), normalized for comparability and subjected to data augmentation, in order to improve machine learning performance. We will leverage existing knowledge on graph databases. We aim to facilitate the combination of satellite data with other sources like sensor recordings at specific geo locations. This allows for advanced data analysis of a wide variety of phenomena, like detection and quantification of objects and changes over time, which again allows for prediction of future occurrences.

User-oriented data hubs and analytics is a cross cutting topic with the aim to design interfaces and user-oriented analytics on top of data warehouses and processing pipelines. In the EPA case the focus is on developing a Danish data hub with Earth observation data. The solution must provide a uniform interface to working with the data providing a user-centric view to data representation. This will then enable decision support systems, which will be worked on in the GEO case, that may be augmented by artificial intelligence and understandable to the human users through explorative graph-based user interfaces and data visualizations. For the GPA case the focus is on a web-frontend for querying AIS data as a trajectory and heat maps and estimating the travel time between two points in Danish waters. As part of the validation the data warehouse and related services will be deployed at GDA and serve as the foundation for future GDA services.

Advancing means to process, store and use Earth observation data has many potential domain applications. To build the world class computer science research and innovation centres, as per the long-term goal of DIREC, this project focuses on building the competencies necessary to address challenges with Earth observation data building on advances in advanced and efficient big data management, software engineering, Internet of Things and machine learning.

Scientific value
The project’s scientific value is the development of new methods and techniques for scalable data warehousing, processing pipelines for multimodal data and user-oriented data hubs and analytics. We expect to publish at least seven rank A research articles and to demonstrate the potential of the developed technologies in concrete real-world applications.

Capacity building
The project will build and strengthen the research capacity in Denmark directly through the education of two PhDs, and through the collaboration between researchers, domain experts, and end-users that will lead to R&D growth in the public and industrial sectors. Research competences to address a stronger digital foundation for the green transformation is important for the Danish society and associated industrial sectors.

Societal and business value
The project will create societal and business value by providing the foundation for the Blue Denmark to reduce environmental and climate impact in Danish and Greenlandic waters to help support the green transformation. With ever-increasing human activity at sea, growing transportation of goods where 90% is being transported by shipping and a goal of a European economy based on carbon neutrality there is a need for activating marine data to support this transformation. For the environmental protection sector the project will provide the foundation for efforts to increase the biodiversity in Denmark by better protection of fauna types and data-supported execution of environmental law. The project will provide significant societal value and directly contribute to SDGs 13 (climate action), 14 (life under water) and 15 (life on land).

In conclusion, the project will provide a strong contribution to the digital foundation for the green transition and support Denmark being a digital frontrunner in this area.

September 1, 2022 – September 31, 2025 – 3 years.

Total budget DKK 12,27 million / DIREC investment DKK 3,6 million

Participants

Project Manager

Kristian Torp

Professor

Aalborg University
Department of Computer Science

E: torp@cs.aau.dk

Christian S. Jensen

Professor

Aalborg University
Department of Computer Science

Thiago Rocha Silva

Associate Professor

University of Southern Denmark
Maersk Mc-Kinney Moller Institute

Aslak Johansen

Associate Professor

University of Southern Denmark
Maersk Mc-Kinney Moller Institute

Sarah Lønholt

Special consultant

Danish Environmental Protection Agency

Mads Darø Kristensen

Principal Application Architect

The Alexandra Institute

Søren Krogh Sørensen

Software Developer

The Alexandra Institute

Oliver Hjermitslev

Visual Computing Specialist

The Alexandra Institute

Mads Robenhagen Mølgaard

Department Director

GEO
Geodata & Subsurface Models

Ove Andersen

Special Consultant

Danish Geodata Agency

Niels Tvilling Larsen

Head of Department

Danish Geodata Agency
Danish Hydrographic Office

Partners

Categories
News

Digitalisation can definitely boost the green transition

13 July 2022

Digitalisation can definitely boost the green transition

Artificial intelligence and algorithms can help calculate how we can best heat our homes, produce efficiently, transport with the least possible energy consumption, and make optimal use of the IT infrastructure as part of the green transition. But it requires that we dare to delegate more tasks to algorithms and invest more in research and development.

Categories
News Phd school

MOVEP 2022: Five Intensive Days on Modelling and Verification

17 JUNE 2022

MOVEP 2022: Five Intensive Days on Modelling and Verification

Automated systems like self-driving cars and AI-based decision support are becoming an increasingly large part of our everyday lives, and so is the need for modelling and verification of the software running these systems. At the MOVEP 2022 Summer School, hosted by the Department of Computer Science, Aalborg University, leading researchers, students and people from the industry convened to discuss challenges and opportunities within this field.

By Stig Andersen, Aalborg University

The five-day MOVEP Summer School 2022 (June 13-17) on modelling and verification of parallel processes had attracted 70+ participants, primarily PhD students, but also people from the industry.

With the lecture hall of the Department of Architecture, Design and Media Technology right at Aalborg’s harbour front as a great venue, they enjoyed a packed programme of talks and tutorials from 11 leading researchers on model checking, controller synthesis, software verification, temporal logics, real-time and hybrid systems, stochastic systems, security, run-time verification, etc.

An exciting field

One of the speakers was Christel Baier, Professor and Head of the chair for Algebraic and Logic Foundations of Computer Science at the Faculty of Computer Science of the Technische Universität Dresden, and together with Joost-Pieter Katoen, the author of a key publication in the field, Principles of Model Checking (MIT Press, 2008). She has been working within the broad field of verification and analysis techniques for stochastic operational models for more than twenty years.

– I really had not expected to work so long within this area, but as it often turns out in science, apparently simple problems are not at all simple and will require more research. So, if the students at this summer school would take the message that this is an exciting and very important field and choose to explore it further, I would be very happy. MOVEP is a very nice event, and being able to come to Denmark and not least being able to meet again after the Corona shutdown is really great, she says.

Application in different fields

Another speaker was Nir Piterman, Professor in the Department of Computer Science and Engineering, University of Gothenburg and Chalmers, and a prominent figure within formal verification and automata theory. He kicked off the summer school programme Monday morning with a tutorial on reactive synthesis, which is a technique for automatically generating correct-by-construction reactive systems from high-level descriptions.

 – In my tutorial, I tried to give the participants a taste of the so-called discrete two-player turn-based games technique, where you think about the environment as one player and the system as another player. The interaction is like a game between the two, and the system has to come up with a strategy to satisfy some goal, he explains.

Nir Piterman also sees an event like MOVEP as a very good opportunity for young researchers to be exposed to concepts and techniques that they would not necessarily be exposed to otherwise.

– It is my hope that the talks and tutorials at this event will fertilize their work and provide them with new ideas about how to apply these techniques in different fields. One possible usage of two-player games is synthesis, but the usage could be wider and potentially applied to other problems, he says.

Nir Piterman is currently the holder of an ERC consolidator grant to study the usage of reactive synthesis for multiple collaborating programs.

Explainability

In her tutorial, Christel Baier focused on explication, which refers to a mathematical concept that in some way sheds light on why a verification process has returned a given result.

– Explainability is important. We have to make systems more understandable to everyone – scientists, designers, users, etc. Today, everybody is an IT user, so this is not only relevant for computer scientists, she says.
According to Christel Baier, there is a higher purpose:

– Since systems make decisions, users should have the opportunity to understand why decisions were made. Moreover, users should be supported in making decisions by themselves and be given an understanding of the configuration of these systems and their possible effects. Again, it comes down to the question of cause and effect, which was a recurring theme of my tutorial.

The research on the results presented by Christel Baier at her tutorial has been carried out within and is motivated by the missions of the collaborative projects “Center for Perspicuous Computing (CPEC)” and “Centre for Tactile Internet with Human-in-the-Loop (CeTI)”.

Correct-by-construction

Research within modelling and verification of parallel processes may also explore the question: Could we automatically generate systems that perform exactly according to the specifications instead of checking afterwards that they do? Nir Piterman dealt with this topic in his tutorial.

– Techniques to automatically generate correct-by-construction reactive systems from high-level descriptions have been explored in academia for quite a number of years. It has proven to work in some domains, but it would not be realistic to set as an ambition to build one synthesizer that you feed a specification to and expect it to auto-generate safe and error-free systems for all possible programming domains, he says.

According to Nir Piterman, the most successful applications so far have been within robotics. However, this success makes us think about what is the meaning of correct-by-construction.

– What does “correct” really mean? If it means that the system does exactly what was described in the specification, what happens if the specification is flawed? So, the focus of the correctness problem might change: Rather than making sure that the system matches the specification, the task is to ensure that the specification is thorough enough and reflects what the designer had in mind.

FURTHER INFORMATION

  • MOVEP 2022 is hosted by the Department of Computer Science, Aalborg University (primary organizer Martin Zimmermann, Associate Professor) and co-sponsored by DIREC an S4OS.
  • The first five editions of MOVEP took place in Nantes (France) every other year from 1994 to 2002. It then moved to Brussels (Belgium) in 2004, Bordeaux (France) in 2006, Orléans (France) in 2008, Aachen (Germany) in 2010, Marseille (France) in 2012, Nantes (France) in 2014, Genova (Italy) in 2016, Cachan (France) in 2018 and online in 2020.
  • More info on the MOVEP 2022 website.

CONTACT
Martin Zimmermann
Associate Professor
Department of Computer Science
Aalborg University
Mail: mzi@cs.aau.dk
Phone: +45 9940 8770

Stig Andersen
Communications Officer
Department of Computer Science
Aalborg University
Mail: stan@cs.aau.dk
Phone: +45 4019 7682

Professor Nir Piterman, University of Gothenburg and Chalmers

Professor Christel Baier, Technische Universität Dresden

Categories
News

SDU behind database for robot assembly: Industrial companies must learn from their own and other companies’ data

31 MAY 2022

SDU behind database for robot assembly: Industrial companies must learn from their own and other companies' data

A new approach for gathering robotic experience from industry, so companies don’t have to start from scratch every time they need to put robots into production, has been launched – SDU is leading the project, which shall make better use of robotic data.

As robots and automation solutions increase in industrial companies, plans and project descriptions grow in numbers, often ending up in a digital folder on a company PC – if such data is stored at all. In most cases the data is never to be used again.

So is the reality today, but in the future, established solutions for automation, robots and data produced during the production process shall be reused, possibly even from other companies. In that way, companies do not have to design the complete solution from scratch themselves.

This is the ambition of the ReRoPro project (Re-Use of Robotic-data in Production through search, simulation and learning) which is funded by the national research centre DIREC and led by the Mærsk Mc-Kinney Møller Institute at SDU.

– Big data is already being used in the IT-field, where it has been crucial for developments in areas such as facial or object recognition. In the robotics field, there is great potential for gathering data and experience from companies that have already adopted new technology and automation in their production, says professor Norbert Krüger from the SDU Robotics at the Faculty of Engineering.

– Today, each company stores robot data – if at all – in their own format, but such data could just as well benefit others so that we can spread the use of robots and ultimately maintain or even attract production back to Denmark, he says.

Giants onboard

SDU is behind ReRoPro, together with Aalborg University and University of Copenhagen who also are partners in the DIREC project. The project also includes two Danish industrial giants, Novo Nordisk and Rockwool, the robotics company Nordbo from the Funen robotics cluster and the Allerød based company Welltec. Odense Robotics and MADE are also partners.

The ambition is to create a structure for a database with the help of the companies. We aim to gather information from existing solutions in such a structure that makes it possible to reuse or be inspired by that data when creating their own solution, says the SDU professor.

Novo Nordisk and Rockwool are already on board, but the intention is to get even more companies involved, Norbert Krüger stresses. That’s why a conference is planned for 8 September, where he hopes many industrial companies will come forward.

– For a lot of companies, this kind of data is considered a company secret, so we need to find a way how they can learn from each other in a safe environment. At the same time, we want to know more about what companies need out in the field so that we can take that into our work, says Norbert Krüger.

Initially, the project will run until autumn. Still, the plan is to outline a plan and invite industry along – via the conference – so that funding can then be secured for a significant research project based on the initial work, which is starting now.

Read more about the conference
Read more about the project

FACTS

  • ReRoPro (Re-Use of Robotic-data in Production through search, simulation, and learning).

  • A new DIREC project headed by SDU in Odense with Aalborg and Copenhagen Universities involved. The ambition is to include many companies and already now the two Danish industrial giants, Novo Nordisk and Rockwool, and the robotics company Nordbo from the Funen robotics cluster as well as Welltech from Allerød are on board. Odense Robotics and MADE are also partners.

  • The initial project, funded by Innovation Fund Denmark, is to establish a structure for the database within six months, after which the ambition is to follow up with a larger project building the actual database and storing big data.

  • On 8 September, a conference is planned to gather companies and potential partners to discuss the ambitions and needs of industry.

Categories
News

New technologies can help banks, insurance companies and authorities fight fraud

26 APRIL 2022

New technologies can help banks, insurance companies and authorities fight fraud

Blockchain-based technologies can not only be used for cryptocurrencies. The technology eliminates the need for an intermediary when making transactions between two parties and can ensure that data cannot be modified.

By combining this feature with cryptographic techniques will enable banks and authorities to share sensitive personal data securely and enable them to fight fraud. This is exactly the purpose of a new project between researchers from Aarhus University, the IT University of Copenhagen and the Alexandra Institute, which is supported by DIREC – Digital Research Centre Denmark.

Read more (in Danish)