Categories
AI Completed project Future of work Green Tech News

Explainable AI will disrupt the grain industry and give farmers confidence

4 July 2023

Explainable AI will disrupt the grain industry and give farmers confidence  

There is a huge potential for AI in the agricultural sector as a large part of food quality assurance is still handled manually. The aim of a research project is to strengthen understanding of and trust in AI and image analysis, which can improve quality assurance, food quality and optimize production.

One of the major critical barriers to using AI and image analysis in the agriculture and food industry is the trust in its effectiveness.

Today, manual visual inspection of grains remains one of the crucial quality assurance procedures throughout the value chain, ensuring the journey of grains from the field to the table and guaranteeing that farmers receive the right price for their crops.

At the Danish-owned family company FOSS, high-tech analytical instruments are developed for the agriculture and food industry, as well as the chemical and pharmaceutical industries.

Since its founding in 1956 by engineer Nils Foss, development and innovation have been high priorities. As a global producer of niche products, staying ahead of competitors is essential.

Hence, collaboration with researchers from the country’s universities is a crucial part of the company’s digital journey. In a project at the National Research Centre for Digital Technologies (DIREC), the company, along with researchers from Technical University of Denmark and University of Copenhagen, aims to map how AI and image analysis can replace the subjective manual inspection of grains with an automated solution based on image processing. The goal is to develop a method using deep learning neural networks to monitor the quality of seeds and grains using multispectral image data. This method has the potential to provide the grain industry with a disruptive tool to ensure quality and optimize the value of agricultural commodities.

The agricultural and food industry is generally a very conservative industry, and building trust in digital technologies is necessary, explains senior researcher Erik Schou Dreier from FOSS. The development of AI, therefore, cannot stand alone. To encourage farmers to adopt the technology, it is crucial to instill confidence in how it works. In this process, researchers use explainable AI to elucidate how the algorithms function.

Today, grain is assessed manually in many places, and replacing manual work with a machine requires trust. Because the work is performed by humans, it is a fairly subjective reference method used today. Humans may not necessarily perform the work the same way every time and can arrive at different results. Therefore, there will be some uncertainty about the outcome.

Mapping and explaining algorithms

– The result is more precise when using AI and image analysis in the process. However, for these new technologies to gain widespread acceptance globally, a model is needed to explain how AI works and arrives at a given result, says Erik Schou Dreier.

Many people have inherent skepticism toward self-driving cars. Self-driving cars need to be even better and safer at driving than us humans before we trust them. Similarly, the AI analysis models we work with must be significantly better than the manual processes they replace for people to trust them. To build that trust, we must first be able to explain how AI analyzes an image and arrives at a given result. That is the goal of the project—to interpret the way AI works, so people can understand how it reads an image.

We typically accept a higher error rate among humans than machines. For us humans to trust the algorithms, they need to be explainable.
Erik Schou Dreier, senior researcher

PhD student Lenka Tetková from Technical University of Denmark is part of the project and spends some days at FOSS’ office. Here, she works with images of grains in two different ways, partly to improve image qualification and partly to better understand how classifications work so they can be enhanced.

– I sometimes use the example of a zebra and a deer to explain how image classification works. Imagine you have a classification that can recognize zebras and deer. Now, you get a new image of an animal with a body like a deer, but the legs resemble those of a zebra. A standard model will not be able to recognize this animal if it hasn’t seen the animal during training. But if you provide it with additional information (metadata) – in this case, a description of all kinds of animals – it will be able to infer that the image corresponds to an okapi, based on its knowledge of zebras, deer, and the description of an okapi. That is, the model will be able to use information not present in the images to achieve better results, explains Lenka Tetková and continues:

– In this project, we want to use metadata about the grains, such as information about the place of origin, weather conditions, pesticide use, and storage conditions, to improve the classification of grains.

Can you find ‘Okapi’ in these pictures? Ph.D. student Lenka Tetková from DTU uses this example to explain how image classification works.

An important competitive advantage

As a global producer of niche products, FOSS must always stay two steps ahead of competitors.

– To ensure there is a market for us in the future, it is crucial to be the first with new solutions. It is challenging to make a profit if there is already a player doing it better, which is why we constantly introduce new digital technologies to improve our analysis tools. And here, collaboration with researchers from the country’s universities is very valuable to us, as we gain new insights and proposed solutions for the further development of our tools, says Erik Schou Dreier and continues:

– In this project, we hope that collaboration with researchers will lead to the development of AI methods and tools that enable us to create new solutions for automated image-based quality assessment and, secondly, that we can increase trust in our product with explainable AI. It is one of the critical themes for us—to create a product that is trusted.

Facts about FOSS

FOSS’ measuring instruments are used everywhere in the agriculture and food industry to quality assure a wide range of raw materials and finished food products.

Traditionally, light wavelengths are measured, and the measurements are used to obtain chemical information about a product. This can include knowledge about protein and moisture content in grains or fat and protein in milk, etc.

FOSS’ customers are large global companies that use FOSS’ products to quality assure and optimize their production—and to ensure the right pricing, so, for example, the farmer gets the right price for their grain.

Deep Learning and Automation of Imaging-based Quality of Seeds and Grains

Project Period: 2020-2024
Budget: DKK 3.91 million

Project participants:

Lenka Tetková
Lars Kai Hansen, Professor DTU
Kim Steenstrup Pedersen, Professor, KU
Thomas Nikolajsen, Head of Front-end Innovation, FOSS
Toke Lund-Hansen, Head of Spectroscopy Team, FOSS
Erik Schou Dreier, Senior Scientist, FOSS

What is a Deep Learning Neural Network?

Deep learning neural networks are computer systems inspired by how our brains function. It consists of artificial neurons called nodes organized in layers. Each node takes in information, processes it, and passes it on to the next layer. This helps the network understand data and make predictions. By training the network with examples and adjusting the connections between nodes, it learns to make accurate predictions on new data. Deep learning neural networks are used for tasks such as image recognition, language understanding, and problem-solving.

Categories
Completed project Explore project

Cyber-Physical Systems with Humans in the Loop

DIREC project

Cyber-physical Systems with humans in the loop

Summary

Constructing cyber-physical systems with humans in the loop enables new possibilities for novel application areas (e.g. bio-computing, active learning systems, and intelligent medical systems). Many of the novel applications are imagined, developed, and deployed to enable humans and machines to engage collaboratively in real-world tasks. Thus, these applications have aspects of both Cyber-Physical Systems (CPS) and Socio-Technical Systems (STS) and are characterized by close cooperation between  -software technologies including designing for situational awareness, safety, privacy, usability, and easy error handling.

To establish a collection on the topic the project will define cross-disciplinary terminology about the involved research areas, list challenges focusing on novel application areas, and survey state-of-the art for the identified challenges. In workshops, the project will for the listed challenges map which of them are important for Danish industry to address in future work. The project will combine literature studies with workshops and foster future collaboration to address existing challenges. The project goal is to foster collaboration among DIREC partners on this topic and publish a survey based on the outcomes of the work.

Project period: 2021-2023
Budget: DKK 0,46 million

Project Manager

  • Associate Professor Mahyar Tourchi Moghaddam
  • Maersk Mc-Kinney Moller Institute, SDU
  • mtmo@mmmi.sdu.dk

Scientific value: The project will provide a better terminology and a common understanding of state-of-theart across several areas of research within DIREC and disseminate this knowledge to the scientific community. 

Capacity building: The project will establish new collaboration setups within DIREC and involve master students in the activities. 

Business value: The project will in workshops disseminate knowledge to Danish industry and identify cases that could be relevant areas of collaboration for DIREC with Danish Industry in future larger projects. The project will among others connect to the community involved in the Nordic IoT Center.

Impact

The project will provide better terminology and a common understanding of state-of-the-art across several areas of research within DIREC and disseminate this knowledge to the scientific community.

Insights

Categories
Completed project Explore project

Re-use of robotic data in production through search, simulation and learning

DIREC project

Re-use of Robotic data in production

through Search, Simulation and Learning

Summary

A robot database with information on previous robot solutions can save manufacturing companies time and money and allow for smaller-scale companies to automate their production as well. 
 


Although it sounds simple, there are several challenges involved with creating a robot database. For example, robot data are complicated as they consist of images, trajectories, force vectors, information on different materials, CAD-files etc. 
 


With input from industry and international experts, this completed project has gained a much better understanding of the challenges. Next step is to delevop software that allows for the reuse of robot data. 

Project Period: 2021-2022

Project Manager

  • Professor Norbert Krüger
  • Maersk Mc-Kinney Moller Institute, SDU
  • norbert@mmmi.sdu.dk

A robot database with information on previous robot solutions can save manufacturing companies time and money and allow for smaller-scale companies to automate their production as well. This is the conclusion of the ReRoPro project. Although it sounds simple, there are several challenges involved with creating a robot database. For example, robot data are complicated as they consist of images, trajectories, force vectors, information on different materials, CAD-files etc. With input from industry and international experts, the researchers have now gained a much better understanding of the challenges.

Next step is to apply for funding to develop software that allow for the reuse of robot data. The research project took place in a cooperation between the University of Southern Denmark, University of Copenhagen and Aalborg University with the companies Rockwool, Novo Nordisk, Nordbo Robotics and WellTec as partners.

Impact

The project will gain valuable knowledge about how to create a robot database that can save manufacturing companies time and money and allow for smaller-scale companies to automate their production. 

insights

Partners

Categories
Completed project Explore project

DeCoRe: Tools and Methods for the Design and Coordination of Reactive Hybrid Systems

DIREC project

DeCore

- Tools and Methods for the Design and Coordination of Reactive Hybrid Systems

Summary

A recurring problem of digitalised industries is to design and coordinate hybrid systems that include IoT (Internet of Things), edge, and cloud solutions. Currently adopted methods and tools are not effective to this end, because they rely too much on informal specifications that are manually written and interpreted by humans.

We aim at exploring the applicability of forefront technologies and methods developed at SDU, KU, and AAU for the design of reactive hybrid IoT-edge-cloud architectures in Danish industry. These technologies are based on unambiguous formal languages, which can be processed by computers to check for desirable design properties (such as compatibility of software interfaces) and to deploy components for monitoring the correct functioning of systems. Adopting these techniques has shown to substantially increase the productivity of digital industries (for example, up to 4x increase in development speed).

We will: 

  1.  carry out a concrete use case with a partner company (Sanovo Technology Group)
  2. initiate knowledge sharing on this topic among AAU, KU, and SDU through workshops
  3. communicate our findings to the rest of the DIREC community.

Project period: 2021-2023

Project Manager

  • Professor Fabrizio Montesi
  • Department of Mathematics and Computer Science, SDU
  • fmontesi@imada.sdu.dk

Scientific value
The scientific value of the project is twofold:

  • (a) concrete knowledge on the advantages and potential challenges brought by the application of cutting-edge techniques like Jolie for the development of hybrid systems (IoT-edge-cloud) in the Danish industry (using Sanovo Technology Group for the case study); and
  • (b) knowledge on the synergies and future directions for the integration of forefront scientific methods for hybrid systems developed by Danish universities (Jolie, UPPAAL, DCR Graphs). Providing a perspective that comes from concrete industrial experience, with substaintiated needs, has significant potential to influence the future development of both research and industrial development in Denmark.

Capacity building
Companies will thus benefit from an increased number of students that they can hire to satisfy their needs with respect to hybrid systems. Universities benefit by gaining sustainable candidates for PhD positions in future projects connected to this exploration. 

Business and societal value
Due to the growth potential in solutions for automation and data intensive processing solutions, this project will strengthen Danish competitiveness through a reduced cost of developing deploying and running IoT and cloud software. Potentially, this could lead to increased export of IT products and services. 

insights

Partners

Categories
Completed project Educational project

Initiatives to improve recruitment and retention of IT students

DIREC project

Initiatives to improve recruitment and retention of IT students

Summary

Denmark needs more IT specialists. But how do we get more young people to study computer science and become IT specialist? This project, consisting of two subprojects, focuses on initiatives that can improve both recruitment and retention of a larger but also more diverse group of young people e.g., female students and students without prior programming experience.

Project Manager

  • Professor Claus Brabrand
  • Department of Computer Science, ITU
  • brabrand@itu.dk
Diversity or Not: Heterogeneous vs Homogeneous study groups
Summary

The first subproject Diversity or Not: Heterogeneous vs Homogeneous Student Groups? will study the effect of diversity on the formation of CS student groups. The intent is to uncover evidence to issue recommendations on how to best form project groups. We expect this knowledge to be beneficial for the recruitment and retention of students as well as for the diversity of students.

Value Creation

We expect the outcomes of this project will create significant value for primarily the Danish universities, but also for the Danish tech industry (technology companies). The project intends to derive research-based recommendations on how to best form (student) project groups. Since group work is so widespread in Computer Science education in all of Denmark to foster communication and collaboration skills in connection with a problem, it is important to figure out what works best. This will strengthen the CS education in Denmark. 

Studying the impact of diversity on project groups will also be important as a proxy for professional groups in a work context, beyond university (with the obvious external threats to validity of this generalization). We expect this knowledge to be beneficial for the recruitment and retention of students as well as for the diversity of the students (e.g., female students and students without prior programming experience). Aside from the experiments themselves and their findings, we intend to also create and publsh (and seek independent generic approval of) generic experimental protocols for how to ethically and responsibly conduct such group diversity-performance experiments. This includes how to quantify group diversity and group performance. We imagine these generic experimental protocols would be relevant for other studies and companies seeking to specialize them in order to conduct their own more specific instances of the experiments. This also includes ethical considerations surrounding similar student experiments and how to make them ethically safe(r)

D-Pop – A Danish Annual Programming and Problemsolving Event

Summary

The second subproject D-Pop – A Danish Annual Programming and Problem Solving Event will plan, organize, and implement physical D-Pop events at Danish CS departments aimed at young people who are beginning programmers at all levels. The participants get increased programming skills and another perspective on programming and problem solving because focus is on collaboration, creativity, and curiosity.

We expect the events to have a positive effect on recruitment and retention of students as well as for the diversity of students.

Value Creation

The expected results of D-Pop are: 

1. Dramatically increased programming skills among participants. This is the expected outcome of just participating, akin to training in any other skill, and includes improved programming language mastery, problem solving skills, resilience, collaboration skills, debugging, and computational problem solving (in particular, algorithmic thinking). This competence boost is independent of the rung of the competence ladder on which the participant starts. I don’t need to reiterate the problems with recruitment of technically competent IT professionals in Denmark. 

2. Increased exposure and recruitment. D-Pop complements the existing pallette of outreach and recruitment activities currently used by Danish CS departments. Compared with similar events, D-Pop content is designed with a focus on immediate, satisfying, and positive feedback to beginning programmers, but in a way that is both honest and values competence, agency, and collaboration. Scalability is built into D-Pop’s infrastructure (both technical and social from the start. 

3. Establishment of a national network of problem setters. The value of this extends beyond D-Pop and immediately includes teaching material for high schools and universities. For another example, the Danish High School Informatics Olympiad (Dansk datalogidyst, of which Thore is a founding steering committee member) is in many aspects an opposite of D-Pop: it is individual, highly competitive, participation is restricted. However, the requirements to the network of people needed to “make DDD work” is identical to that of D-Pop. We are very far behind in Denmark on this compared to our Nordic neighbours. (Not to speak of other countries, where these activities are multi-million dollar industries.) 

Categories
Completed project Explore project

Accountability Privacy Preserving Computation via Blockchain

DIREC project

Accountability Privacy Preserving Computation via Blockchain

Summary

This project aims to combine secure multiparty computation and blockchain techniques, to enable efficient privacy-preserving computation with accountability, allowing computation on private data while maintaining an audit trail for third-party verification. The project can potentially help fight discrimination, catch unethical and fraudulent behavior, and generate positive publicity for honest participation.

Project period: 2022-2024

Project Manager

  • Associate Professor Bernardo David
  • Department of Computer Science, ITU
  • beda@itu.dk

 

The project will investigate how to combine secure multiparty computation and blockchain techniques to obtain more efficient privacy-preserving computation with accountability. Privacy-preserving computation with accountability allows computation on private data (without compromising data privacy), while obtaining an audit trail that allows third parties to verify that the computation succeeded or to identify bad actors who tried to cheat. Applications include data analysis (e.g., in the context of discrimination detection and bench marking) and fraud detection (e.g. in the financial and insurance industries).

Value Creation

Using this kind of auditable continuous secure computation can help fight discrimination and catch unethical and fraudulent behaviour. Computations that advance these goals include aggregate statistics on salary information  to help identify and eliminate wage gaps (e.g. as seen in the case of the Boston wage gap study [4]), statistics on bids in an auction or bets on a gambling site to determine whether those bids or bets are fraudulent, and many others.

Organizations would not be able to carry out such computations without the use of privacy-preserving technologies due to privacy regulations; so, secure computation is necessary here. To be useful, these secure computations crucially require authenticity and consistency of the inputs. Organizations, which will not necessarily be driven by altruism, will have several incentives to participate in these computations.

First, by using secure computation to detect fraud, the participants can guard against financial loss.

Second, when participants are public organizations, honest participation (which anyone can verify) will generate positive publicity.

insights

Categories
Completed project Explore project

Certifiable Controller Synthesis for Cyber-Physical Systems

DIREC project

Certifiable Controller Synthesis for Cyber-physical systems

Summary

As cyber-physical systems (CPSs) are becoming ever more ubiquitous, many of them are considered safetycritical. We want to help CPS manufacturers and regulators with establishing high levels of trust in automatically synthesized control software for safety-critical CPSs. To this end, we propose to extend the technique of formal certification towards controller synthesis: controllers are synthesized together with a safety certificate that can be verified by highly trusted theorem provers.

Project period: 2022-2023

Project Manager

  • Assistant Professor Martijn Goorden
  • Eindhoven University of Technology
Value Creation

From a distant view point, our project aims to increase confidence in safety-critical CPSs that interact with individuals and the society at large. This is the main motivation for applying formal methods to the construction of CPSs. However, our project aims to give a unique spin to this. By cleverly combining the existing methods of controller synthesis, (timed automata) mode checking, and interactive theorem proving via means of certificate extraction and checking, we aim to facilitate the construction of control software for CPSs that ticks all the boxes: high efficiency, a very high level of trust in the safety of the system, and the possibility to independently audit the software. Given that CPSs have already conquered every sector of life, with the bulk of the development still ahead of us, we believe such an approach could make an important contribution towards technology that benefits the people. 

Moreover, our approach aims to ease the interaction between the CPS industry and certification authorities. We believe it is an important duty of regulatory authorities to safeguard their citizens from failures of critical CPSs. Even so, regulation should not grind development to a halt. With our work, we hope to somewhat remedy this apparent conflict of interests. By providing a means to check the safety of synthesized controllers in a well-documented, reproducible, and efficient manner, we believe that the interaction between producers and certifying bodies could be sped up significantly, while increasing reliability at the same time. On top of that, controller synthesis has already been intensely studied and seems to be a rather mature technology from an academic perspective. However, it has barely set a foot into industrial applications. We are confident that formal certificate extraction and checking can be an important stepping stone to help controller synthesis make this jump. 

This project also contributes to the objective of DIREC to bring new academic partners together in the Danish eco-system. The two principal investigators have their specialization background in two different fields (certification theory and control theory) and have not collaborated before. Thus the project strengthens the collaboration between the two fields as well as the collaboration between the two research groups at AU and AAU. This creates the opportunity for the creation of new scientific results benefiting both research fields. 

Finally, we plan to generate tangible value for industry. There are many present-day use cases for control software of critical CPSs. During our project, we want to aid these use cases with controllers that tick all of the aforementioned “boxes”. This can be done by initiating several student projects and theses supporting theory development, tool implementation, and use case demonstration. The Problem Based Learning approach of Aalborg University facilitates this greatly. Furthermore, those students can use their experience
in future positions after graduating. 

Categories
Completed project Explore project

Methodologies for scheduling and routing droplets in digital microfluidic biochips

DIREC project

Methodologies for scheduling and routing droplets in digital microfluidic biochips

Summary

The overall purpose of this project is to define, investigate, and provide preliminary methodologies for scheduling and routing microliter-sized liquid droplets on a planar surface in the context of digital microfluidics.

The main idea is to use a holistic approach in the design of scheduling and routing methodologies that takes into account real-world physical, topological, and behavioral constraints. Thus, producing solutions that can immediately find use in practical applications.

Project period: 2021-2022

Project Manager

  • Associate Professor Luca Pezzarossa
  • Department of Applied Mathematics and Computer Science, DTU
  • lpez@dtu.dk
Value Creation

DMF biochips have been in the research spotlight for over a decade. However, the technology is still not mature at a level where it can deliver extensive automation to be used in applied biochemistry processes or for research purposes. One of the main reasons is that, although rather simple in construction, DMF biochips lack a clear automated procedure for being programmed and used. The existing methodologies for programming DMF biochips require an advanced level of understanding of software programming and of the architecture of the biochip itself. These skills are not commonly found in potential target users of this technology, such as biologists and chemists.

A fully automated compilation pipeline able to translate biochemical protocols expressed in a high-level representation into the low-level biochip control sequences would enable access to the DMF technology by a larger number of researchers and professionals. The advanced scheduling and routing methodologies investigated by this project are one of the main obstacles towards broadly accessible DMF technology. This is particularly relevant for researchers and small businesses which cannot afford the large pipetting robots commonly used to automate biochemical industrial protocol. One or more DMF biochips can be programmed to execute ad-hoc repetitive and tedious laboratory tasks. Thus, freeing qualified working hours for more challenging laboratory tasks.

In addition, the scheduling and routing methodologies targeted by this project enable for online decisions, such as controlling the flow of the biochemical protocols depending upon on-the-fly sensing results from the processes occurring on the biochip. This opens for a large set of possibilities in the biochemical research field. For instance, the behavior of complex biochemical protocols can be automatically adapted during execution using decisional constructs (if-then-else) allowing for real-time protocol optimizations and monitoring.

From a scientific perspective, this project would enable cross-field collaboration, develop new methodologies, and potentially re-purpose those techniques that are well known in one research field to solve problems of another field. For the proposed project, interesting possibilities include adapting advanced routing and
graph-related algorithms or applying well-known online algorithms techniques to manage the real-time flow control nature of the biochemical protocol. The cross-field nature of the project has the potential of providing a better understanding of how advanced scheduling and routing techniques can be applied in the context of a strongly constrained application such as DMF biochips. Thus, laying the ground for novel solutions, collaborations, and further research.

Finally, it should be mentioned that the outcome of this project, or of a future larger project based on the proposed explorative research, is characterized by a concrete business value. Currently, some players have entered the market with DMF biochips built to perform a specific biochemical functionality [12,13]. A software stack that includes compilation tools supporting programmability and enabling the same DMF biochip to perform different protocols largely expands the potential market of such technology. This is not the preliminary aim of this research project, but it is indeed a long-term possibility.

insights

Categories
Completed project Explore project

Automated Verification of Sensitivity Properties for Probabilistic Programs

DIREC project

Automated Verification of Sensitivity Properties for Probabilistic Programs

Summary

Sensitivity measures how much program outputs vary when changing inputs. We propose exploring novel methodologies for specifying and verifying sensitivity properties of probabilistic programs such that they (a) are comprehensible to everyday programmers, (b) can be verified using automated theorem provers, and (c) cover properties from the machine learning and security literature.

Project period: 2022-2023

Project Manager

  • Associate Professor Christoph Matheja
  • Department of Applied Mathematics and Computer Science, DTU
  • chmat@dtu.dk

and

  • Postdoc Alejandro Aguirre
  • Department of Computer Science, AU
  • alejandro@cs.au.dk

Our overall objective is to explore how automated verification of sensitivity properties of probabilistic programs can support developers in increasing the trust in their software through formal assurances.

Probabilistic programs are programs with the ability to sample from probability distributions. Examples include randomized algorithms, where sampling is exploited to ensure that expensive executions have a low probability, cryptographic protocols, where randomness is essential for encoding secrets, and statistics, where programs are becoming a popular alternative to graphical models for describing complex distributions.

The sensitivity of a program determines how its outputs are affected by changes to its input; programs with low sensitivity are robust against fluctuations in their input – a key property for improving trust in software. Minor input changes should, for example, not affect the result of a classifier learned from training data. In the probabilistic setting, the output of a program depends not only on the input but also on the source of randomness. Hence, the notion of sensitivity – as well as techniques for reasoning about it – needs refinement.

Automated verification takes a deductive approach to proving that a program satisfies its specification: users annotate their programs with logical assertions; a verifier then generates verification conditions (VCs) whose validity implies that the program’s specification holds. Deductive verifiers are more complete and more scalable than fully automatic techniques but require significant user interaction. The main challenge for users of automated verifiers lies in finding suitable intermediate assertions, particularly loop invariants, such that an automated theorem prover can discharge the generated VCs. A significant challenge for developers of automated verifiers is to keep the amount and complexity of necessary annotations as low as possible.

Previous work [1] co-authored by the applicants provides a theoretical framework for reasoning about the sensitivity of probabilistic programs: the above paper presents a calculus to carry out “pen-and-paper” proofs of sensitivity in a principled and syntax-directed manner. The proposed technique deals with sampling instructions by requiring users to identify suitable probabilistic couplings, which act as synchronization points, on top of finding loop invariants. However, the technique is limited in the sense that it does not provide tight sensitivity bounds when changes to the input cause a program to take a different branch on a conditional.

Our project has four main goals. First, we will develop methodologies that do not suffer from the limitations of [1]. We believe that conditional branching can be treated by carefully tracking the possible divergence.

Second, we will develop an automated verification tool for proving sensitivity properties of probabilistic programs. The tool will generate VCs based on the calculus from [1], which will be discharged using an SMT solver. In designing the specification language, we aim to achieve a balance so that (a) users can conveniently specify synchronization points for random samples (via so-called probabilistic couplings) and (b) existing solvers can prove the resulting VCs.

Third, we aim to aid the verification process by assisting users in finding synchronization points. Invariant synthesis has been extensively studied in the case of deterministic programs. Similarly, coupling synthesis has been recently studied for the verification of probabilistic programs. We believe these techniques can be adapted to the study of sensitivity.

Finally, we will validate the overall verification system by applying it to case studies from machine learning, statistics, and randomized algorithms.

insights

Categories
Completed project Explore project

Understanding Biases and Diversity of Big Data used for Mobility Analysis

DIREC project

Understanding biases and diversity of big data used for mobility analysis

Summary

Our capabilities to collect, store and analyze vast amounts of data have greatly increased in the last two decades, and today big data plays a critical role in a large majority of statistical algorithms. Unfortunately, our understanding of biases in data has not kept up. While there has been lot of progress in developing new models to analyze data, there has been much less focus on understanding the fundamental shortcomings of big data.

This project will quantify the biases and uncertainties associated with human mobility data collected through digital means, such a smartphone GPS traces, cell phone data, and social media data.

Ultimately, we want to ask the question: is it possible to fix big mobility data through a fundamental understanding of how biases manifest themselves?

Project period: 2022-2024

Project Manager 

  • Associate Professor Vedran Sekara
  • Department of Computer Science, ITU
  • vsek@itu.dk

Value creation

We expect this project to have a long-lasting scientific and societal impact. The scientific impact of this work will allow us to explicitly model bias in algorithmic systems relying on human mobility data and provide insights into which population are left out. For example, it will allow us to correct for gender, wealth, age, and other types of biases in data globally used for epidemic modeling, urban planning, and many other usecases.

Further, having methods to debias data will allow us to understand what negative impacts results derived from biased data might have. Given the universal nature of bias, we expect our developed debiasing frameworks will also pave the way for quantitative studies of bias in other realms of data science. 

The societal impact will be actionable recommendations provided to policy makers regarding: 1) guidelines for how to safely use mobility datasets in data-driven decision processes, 2) tools (including statistical and interactive visualizations) for quantifying the effects of bias in data, and 3) directions for
building fairer and equitable algorithm that rely on mobility data. 

It is important to address these issues now, because in their “Proposal for a Regulation on a European approach for Artificial Intelligence” from April 2021 the European Commission (European Union) outlines potential future regulations for addressing the opacity, complexity, bias, and unpredictability of algorithmic systems.

This document states that high-quality data is essential for algorithmic performance and suggest that any dataset should be subject to appropriate data governance and management practices, including examination in view of possible biases. This implies that in the future businesses and governmental agencies will need to have data-audit methods in place. Our project addresses this gap and provides value by
developing methodologies to audit mobility data for different types of biases — producing tools which Danish society and Danish businesses will benefit from.

insights