Search
Close this search box.
Categories
Bridge project

Multimodal Data Processing of Earth Observation Data

DIREC project

Multimodal data processing of Earth Observation Data

Summary

Based on Earth observations, a number of Danish public organizations build and maintain important data foundations that are used for decision-making, e.g., for executing environmental law or making planning decisions in both private and public organizations in Denmark.  

Together with some of these public organizations, this project aims to support the digital acceleration of the green transition by strengthening the data foundation for environmental data. There is a need for public organizations to utilize new data sources and create a scalable data warehouse for Earth observation data. This will involve building processing pipelines for multimodal data processing and designing user-oriented data hubs and analytics. 

Project period: 2022-2025
Budget: DKK 12,27 million

Project Manager

  • Professor Kristian Torp
  • Department of Computer Science, AAU
  • torp@cs.aau.dk

The Danish partnership for digitalization has concluded that there is a need to support the digital acceleration of the green transition. This includes strengthening efforts to establish a stronger data foundation for environmental data. Based on observations of the Earth a range of Danish public organizations build and maintain important data foundations. Such foundations are used for decision making, e.g., for executing environmental law or making planning decisions in both private and public organizations in Denmark.

The increasing possibilities of automated data collection and processing can decrease the cost of creating and maintaining such data foundations and provide service improvements to provide more accurate and rich information. To realize such benefits, public organizations need to be able to utilize the new data sources that become available, e.g., to automize manual data curation tasks and increase the accuracy and richness of data. However, the organizations are challenged by the available methods ability to efficiently combine the different sources of data for their use cases. This is particularly the case when user-facing tools must be constructed on top of the data foundation. The availability of better data for end-users will among others help the user decrease the cost of executing environmental law and making planning decisions. In addition, the ability of public data sources to provide more value to end-users, improves the societal return-on-investment for publishing these data, which is in the interest of the public data providers as well as their end-users and the society at large.

The Danish Environmental Protection Agency (EPA) has the option to receive data from many data sources but today does not utilize this because today’s lack of infrastructure makes it cost prohibitive to take advantage of the data. Therefore, they are expressing a need for methods to enable a data hub that provide data products combining satellite, orthophoto and IoT data. The Danish GeoData Agency (GDA) collects very large quantities of Automatic Identification System (AIS) data from ships sailing in Denmark. However, they are only to a very limited degree using this data today. The GDA has a need for methods to enable a data hub that combines multiple sources of ship-based data including AIS data, ocean observation data (sea level and sea temperature) and metrological data. There is a need for analytics on top that can provide services for estimating travel-time at sea or finding the most fuel-efficient routes. This includes estimating the potential of lowering CO2 emissions at sea by following efficient routes.

Geo supports professional users in performing analysis of subsurface conditions based on their own extensive data, gathered from tens of thousands of geotechnical and environmental drilling operations, and on public sources. They deliver a professional software tool that presents this multi modal data in novel ways and are actively working on creating an educational platform giving high school students access to the same data. Geo has an interest in and need for methods for adding live, multi modal data to their platform, to support both professional decision makers and students. Furthermore, they have a need for novel new ways of querying and representing such data, to make it accessible to professionals and students alike. Creating a testbed for combining Geo’s data with satellite feeds, combined with automated processing to interpret this data, will create new synergies and has the potential to greatly improve the visualizations of the subsurface by building detailed, regional and national 3D voxel models.

Therefore, the key challenges that this project will address are how to construct scalable data warehouses for Earth observation data, how to design systems for combining and enriching multimodal data at scale and how to design user-oriented data interfaces and analytics to support domain experts. Thereby, helping the organizations to produce better data for the benefit of the green transition of the Danish society.

The aim of the project is to do use-inspired basic research on methods for multimodal processing of Earth observation data. The research will cover the areas of advanced and efficient big data management, software engineering, Internet of Things and machine learning. The project will research in these areas in the context of three domain cases with GDA on sea data and EPA/GEO on environmental data.

Scalable data warehousing is the key challenge that work within advanced and efficient big data management will address. The primary research question is how to build a data warehouse with billions of rows of all relevant domain data. AIS data from GDA will be studied and in addition to storage also data cleaning will be addressed. On top of the data warehouse, machine learning algorithms must be enabled to compute the fastest and most fuel-efficient route between two arbitrary destinations.

Processing pipelines for multimodal data processing is the key topic for work within software engineering, Internet of Things and machine learning. The primary research question is how to engineer data processing pipelines that allows for enriching data through processes of transformation and combination. In the EPA case there is a need for enriching data by combining data sources, both from multiple sources (e.g., satellite and drone) and modality (e.g., the NDVI index for quantifying vegetation greenness is a function over a green and a near infrared band). Furthermore, we will research methods for easing the process of bringing disparate data into a form that can be inspected both by a human and an AI user. For example, data sources are automatically cropped to a polygon representing a given area of interest (such as a city, municipality or country), normalized for comparability and subjected to data augmentation, in order to improve machine learning performance. We will leverage existing knowledge on graph databases. We aim to facilitate the combination of satellite data with other sources like sensor recordings at specific geo locations. This allows for advanced data analysis of a wide variety of phenomena, like detection and quantification of objects and changes over time, which again allows for prediction of future occurrences.

User-oriented data hubs and analytics is a cross cutting topic with the aim to design interfaces and user-oriented analytics on top of data warehouses and processing pipelines. In the EPA case the focus is on developing a Danish data hub with Earth observation data. The solution must provide a uniform interface to working with the data providing a user-centric view to data representation. This will then enable decision support systems, which will be worked on in the GEO case, that may be augmented by artificial intelligence and understandable to the human users through explorative graph-based user interfaces and data visualizations. For the GPA case the focus is on a web-frontend for querying AIS data as a trajectory and heat maps and estimating the travel time between two points in Danish waters. As part of the validation the data warehouse and related services will be deployed at GDA and serve as the foundation for future GDA services.

Advancing means to process, store and use Earth observation data has many potential domain applications. To build the world class computer science research and innovation centres, as per the long-term goal of DIREC, this project focuses on building the competencies necessary to address challenges with Earth observation data building on advances in advanced and efficient big data management, software engineering, Internet of Things and machine learning.

Scientific value
The project’s scientific value is the development of new methods and techniques for scalable data warehousing, processing pipelines for multimodal data and user-oriented data hubs and analytics. We expect to publish at least seven rank A research articles and to demonstrate the potential of the developed technologies in concrete real-world applications.

Capacity building
The project will build and strengthen the research capacity in Denmark directly through the education of two PhDs, and through the collaboration between researchers, domain experts, and end-users that will lead to R&D growth in the public and industrial sectors. Research competences to address a stronger digital foundation for the green transformation is important for the Danish society and associated industrial sectors.

Societal and business value
The project will create societal and business value by providing the foundation for the Blue Denmark to reduce environmental and climate impact in Danish and Greenlandic waters to help support the green transformation. With ever-increasing human activity at sea, growing transportation of goods where 90% is being transported by shipping and a goal of a European economy based on carbon neutrality there is a need for activating marine data to support this transformation. For the environmental protection sector the project will provide the foundation for efforts to increase the biodiversity in Denmark by better protection of fauna types and data-supported execution of environmental law. The project will provide significant societal value and directly contribute to SDGs 13 (climate action), 14 (life under water) and 15 (life on land).

In conclusion, the project will provide a strong contribution to the digital foundation for the green transition and support Denmark being a digital frontrunner in this area.

Impact

The project will provide the foundation for the Blue Denmark to reduce environmental and climate impact in Danish and Greenlandic waters to help support the green transition.  

Insights

Partners

Categories
Bridge project

Mobility Analytics using Sparse Mobility Data and Open Spatial Data

DIREC project

Mobility Analytics using Sparse Mobility Data and Open Spatial Data

Summary

Both society and industry have a substantial interest in well-functioning outdoor and indoor mobility infrastructures that are efficient, predictable, environmentally friendly, and safe. For outdoor mobility, reduction of congestion is high on the political agenda as is the reduction of CO2 emissions, as the transportation sector is the second largest in terms of greenhouse gas emissions. For indoor mobility, corridors and elevators represent bottlenecks for mobility in large building complexes.  

The amount of mobility-related data has increased massively which enables an increasingly wide range of analyses. When combined with digital representations of road networks and building interiors, this data holds the potential for enabling a more fine-grained understanding of mobility and for enabling more efficient, predictable, and environmentally friendly mobility.   

Project period: 2021-2024
Budget: DKK 9,41 million

Project Manager

  • Professor Christian S. Jensen
  • Department of Computer Science, AAU
  • csj@cs.aau.dk

The mobility of people and things is an important societal process that facilitates and affects the lives of most people. Thus, society, including industry, has a substantial interest in well-functioning outdoor and indoor mobility infrastructures that are efficient, predictable, environmentally friendly, and safe. For outdoor mobility, reduction of congestion is high on the political agenda – it is estimated that congestion costs Denmark 30 billion DKK per year. Similarly, the reduction of CO2 emissions from transportation is on the political agenda, as the transportation sector is the second largest in terms of greenhouse gas emissions. Danish municipalities are interested in understanding the potentials for integrating various types of e-bikes in transportation planning. Increased use of such bicycles may contribute substantially to the greening of transportation and may also ease congestion and thus improve travel times. For indoor mobility, corridors and elevators represent bottlenecks for mobility in large building complexes (e.g. hospitals, factories and university campuses). With the addition of mobile robots, humans and robots will also be fighting to use the same space when moving indoors. Heavy use of corridors is also a source of noise that negatively impacts building occupants.

The ongoing, sweeping digitalisation has also reached outdoor and indoor mobility. Thus, increasingly massive volumes of mobility-related data, e.g. from sensors embedded in the road and building infrastructures, networked positioning (e.g. GPS or UWB) devices (e.g. smartphones and in-vehicle navigation devices) or indoor mobile robots, are becoming available. This enables an increasingly wide range of analyses related to mobility. When combined with digital representations of road networks and building interiors, this data holds the potential for enabling a more fine-grained understanding of mobility and for enabling more efficient, predictable, and environmentally friendly mobility. Long movement times equate with congestion and bad overall experiences.

The above data foundation offers a basis for understanding how well a road network or building performs across different days and across the duration of a day, and it offers the potential for decreased movement times by means of improved mobility flows and routing. However, there is an unmet need for low-cost tools that can be used by municipalities and building providers (e.g. mobile robot manufactures) that are capable of enabling a wide range of analytics on top of mobility data.

  1. Build extract-transform-load (ETL) prototypes that are able to ingest high and low frequency spatial data (e.g. GPS and indoor positioning data). These prototypes must enable map-matching of spatial data to open road network and building representations and must enable privacy protection.
  2. Design effective data warehouse schemas that can be populated with ingested spatial data.
  3. Build mobility analytics warehouse systems that are able to support a broad range of analyses in interactive time.
  4. Build software systems that enable users to formulate analyses and visualise results in maps-based interfaces for both indoor and outdoor use. This includes infrastructure for the mapping of user input into database queries and the maps-based display of results returned by the data warehouse system.
  5. Develop a range of advanced analyses that address user needs. Possible analyses include congestion maps, isochrones, aggregate travel-path analyses, origin-destination travel time matrices, and what-if analyses where the effects of reconstruction are estimated (e.g. adding an additional lane to a stretch of road or changing corridors). For outdoors settings, CO2-emissions analyses based on vehicular environmental impact models and GPS data are also considered.
  6. Develop transfer learning techniques that make it possible to leverage spatial data from dense spatio-temporal “regions” for enabling analyses in sparse spatio-temporal regions.

Value creation
The envisioned prototype software infrastructure characterised above aims to be able to replace commercial road network maps with the crowd sourced OpenStreetMap (OSM) map and for indoors enable new data sources about the indoor geography. The open data might not be curated, which means that new quality control tools are required to ensure that computed travel times are correct. This will reduce cost.

Next, the project will provide means of leveraging available spatial data as efficiently and effectively as possible. In particular, while more and more data becomes available, the available data will remain sparse in relation to important analyses. This is due to the cost of data that can be purchased and due to the lack of desired data. Thus, it is important to be able to exploit available data as well as possible. We will examine how to transfer data from locations and times with ample data to locations and times with insufficient data. For example, we will study transfer learning techniques for this purpose; and as part of this, we will study feature learning. This will reduce cost and will enable new analyses that where not possible previously due to a lack of data.

Rambøll will be able to in-source the software infrastructure and host analytics for municipalities. Mobile Industrial Robotics (MiR) will be able to in-source the software infrastructure and host analytics for building owners. Additional value will be created because the above studies will be conducted for multiple transportation modes, with a focus on cars and different kinds of e-bikes. We have access to a unique data foundation that will enable these studies.

Impact

The project will provide a prototype software infrastructure that aims to be able to replace commercial road network maps with the crowd sourced OpenStreetMap (OSM) and for indoors enable new data sources about the indoor geography.

The open data might not be curated, which means that new quality control tools are required to ensure that computed travel times are correct. This will reduce cost.

insights

partners