DIREC project
Multimodal data processing of Earth Observation Data
Summary
Based on Earth observations, a number of Danish public organizations build and maintain important data foundations that are used for decision-making, e.g., for executing environmental law or making planning decisions in both private and public organizations in Denmark.
Together with some of these public organizations, this project aims to support the digital acceleration of the green transition by strengthening the data foundation for environmental data. There is a need for public organizations to utilize new data sources and create a scalable data warehouse for Earth observation data. This will involve building processing pipelines for multimodal data processing and designing user-oriented data hubs and analytics.
Project period: 2022-2025
Budget: DKK 12,27 million
Project Manager
- Professor Kristian Torp
- Department of Computer Science, AAU
- torp@cs.aau.dk
The Danish partnership for digitalization has concluded that there is a need to support the digital acceleration of the green transition. This includes strengthening efforts to establish a stronger data foundation for environmental data. Based on observations of the Earth a range of Danish public organizations build and maintain important data foundations. Such foundations are used for decision making, e.g., for executing environmental law or making planning decisions in both private and public organizations in Denmark.
The increasing possibilities of automated data collection and processing can decrease the cost of creating and maintaining such data foundations and provide service improvements to provide more accurate and rich information. To realize such benefits, public organizations need to be able to utilize the new data sources that become available, e.g., to automize manual data curation tasks and increase the accuracy and richness of data. However, the organizations are challenged by the available methods ability to efficiently combine the different sources of data for their use cases. This is particularly the case when user-facing tools must be constructed on top of the data foundation. The availability of better data for end-users will among others help the user decrease the cost of executing environmental law and making planning decisions. In addition, the ability of public data sources to provide more value to end-users, improves the societal return-on-investment for publishing these data, which is in the interest of the public data providers as well as their end-users and the society at large.
The Danish Environmental Protection Agency (EPA) has the option to receive data from many data sources but today does not utilize this because today’s lack of infrastructure makes it cost prohibitive to take advantage of the data. Therefore, they are expressing a need for methods to enable a data hub that provide data products combining satellite, orthophoto and IoT data. The Danish GeoData Agency (GDA) collects very large quantities of Automatic Identification System (AIS) data from ships sailing in Denmark. However, they are only to a very limited degree using this data today. The GDA has a need for methods to enable a data hub that combines multiple sources of ship-based data including AIS data, ocean observation data (sea level and sea temperature) and metrological data. There is a need for analytics on top that can provide services for estimating travel-time at sea or finding the most fuel-efficient routes. This includes estimating the potential of lowering CO2 emissions at sea by following efficient routes.
Geo supports professional users in performing analysis of subsurface conditions based on their own extensive data, gathered from tens of thousands of geotechnical and environmental drilling operations, and on public sources. They deliver a professional software tool that presents this multi modal data in novel ways and are actively working on creating an educational platform giving high school students access to the same data. Geo has an interest in and need for methods for adding live, multi modal data to their platform, to support both professional decision makers and students. Furthermore, they have a need for novel new ways of querying and representing such data, to make it accessible to professionals and students alike. Creating a testbed for combining Geo’s data with satellite feeds, combined with automated processing to interpret this data, will create new synergies and has the potential to greatly improve the visualizations of the subsurface by building detailed, regional and national 3D voxel models.
Therefore, the key challenges that this project will address are how to construct scalable data warehouses for Earth observation data, how to design systems for combining and enriching multimodal data at scale and how to design user-oriented data interfaces and analytics to support domain experts. Thereby, helping the organizations to produce better data for the benefit of the green transition of the Danish society.
The aim of the project is to do use-inspired basic research on methods for multimodal processing of Earth observation data. The research will cover the areas of advanced and efficient big data management, software engineering, Internet of Things and machine learning. The project will research in these areas in the context of three domain cases with GDA on sea data and EPA/GEO on environmental data.
Scalable data warehousing is the key challenge that work within advanced and efficient big data management will address. The primary research question is how to build a data warehouse with billions of rows of all relevant domain data. AIS data from GDA will be studied and in addition to storage also data cleaning will be addressed. On top of the data warehouse, machine learning algorithms must be enabled to compute the fastest and most fuel-efficient route between two arbitrary destinations.
Processing pipelines for multimodal data processing is the key topic for work within software engineering, Internet of Things and machine learning. The primary research question is how to engineer data processing pipelines that allows for enriching data through processes of transformation and combination. In the EPA case there is a need for enriching data by combining data sources, both from multiple sources (e.g., satellite and drone) and modality (e.g., the NDVI index for quantifying vegetation greenness is a function over a green and a near infrared band). Furthermore, we will research methods for easing the process of bringing disparate data into a form that can be inspected both by a human and an AI user. For example, data sources are automatically cropped to a polygon representing a given area of interest (such as a city, municipality or country), normalized for comparability and subjected to data augmentation, in order to improve machine learning performance. We will leverage existing knowledge on graph databases. We aim to facilitate the combination of satellite data with other sources like sensor recordings at specific geo locations. This allows for advanced data analysis of a wide variety of phenomena, like detection and quantification of objects and changes over time, which again allows for prediction of future occurrences.
User-oriented data hubs and analytics is a cross cutting topic with the aim to design interfaces and user-oriented analytics on top of data warehouses and processing pipelines. In the EPA case the focus is on developing a Danish data hub with Earth observation data. The solution must provide a uniform interface to working with the data providing a user-centric view to data representation. This will then enable decision support systems, which will be worked on in the GEO case, that may be augmented by artificial intelligence and understandable to the human users through explorative graph-based user interfaces and data visualizations. For the GPA case the focus is on a web-frontend for querying AIS data as a trajectory and heat maps and estimating the travel time between two points in Danish waters. As part of the validation the data warehouse and related services will be deployed at GDA and serve as the foundation for future GDA services.
Advancing means to process, store and use Earth observation data has many potential domain applications. To build the world class computer science research and innovation centres, as per the long-term goal of DIREC, this project focuses on building the competencies necessary to address challenges with Earth observation data building on advances in advanced and efficient big data management, software engineering, Internet of Things and machine learning.
Scientific value
The project’s scientific value is the development of new methods and techniques for scalable data warehousing, processing pipelines for multimodal data and user-oriented data hubs and analytics. We expect to publish at least seven rank A research articles and to demonstrate the potential of the developed technologies in concrete real-world applications.
Capacity building
The project will build and strengthen the research capacity in Denmark directly through the education of two PhDs, and through the collaboration between researchers, domain experts, and end-users that will lead to R&D growth in the public and industrial sectors. Research competences to address a stronger digital foundation for the green transformation is important for the Danish society and associated industrial sectors.
Societal and business value
The project will create societal and business value by providing the foundation for the Blue Denmark to reduce environmental and climate impact in Danish and Greenlandic waters to help support the green transformation. With ever-increasing human activity at sea, growing transportation of goods where 90% is being transported by shipping and a goal of a European economy based on carbon neutrality there is a need for activating marine data to support this transformation. For the environmental protection sector the project will provide the foundation for efforts to increase the biodiversity in Denmark by better protection of fauna types and data-supported execution of environmental law. The project will provide significant societal value and directly contribute to SDGs 13 (climate action), 14 (life under water) and 15 (life on land).
In conclusion, the project will provide a strong contribution to the digital foundation for the green transition and support Denmark being a digital frontrunner in this area.
Impact
The project will provide the foundation for the Blue Denmark to reduce environmental and climate impact in Danish and Greenlandic waters to help support the green transition.