In times of crisis, timely and accurate geospatial data is crucial for effective humanitarian response. This GIS Day 2024, we discuss a new project: an automated data pipeline to streamline the collection and preparation of essential geospatial datasets for emergencies. By replicating the data scramble process our GIS teams typically perform during emergencies, the MapAction Automated Data Pipeline aims to expedite the delivery of critical information to those who need it most.
By: Evangelos Diakatos, MapAction Data Engineer
By automating the acquisition of these datasets, the pipeline aims to improve efficiency by reducing the time required to gather and prepare data during emergencies. It enhances accuracy by providing up-to-date and consistent datasets for mapping and analysis, enabling the GIS team to focus on critical analysis and map production rather than manual data collection. This supports a more rapid and effective humanitarian response.
Data Sources
The pipeline integrates data from several key sources. One of the primary sources is the Humanitarian Data Exchange (HDX), a platform hosted by OCHA that offers a wide range of humanitarian datasets. HDX provides access to critical information necessary for planning and coordinating emergency responses.
Another important source is Google Earth Engine (GEE), a cloud-based platform that facilitates the processing of satellite imagery and other geospatial datasets. Additionally, the pipeline retrieves data from OpenStreetMap (OSM), a collaborative project aimed at creating a free, editable map of the world. OSM provides detailed geographical information, including roads, buildings, and points of interest.
Figure: Example of baseline map made by MapAction on past emergency responses. Our new data pipeline aims to automate the acquisition and processing of data used in these maps, such as administrative boundaries, transport infrastructure and geographic features
The datasets collected and processed by the pipeline are mainly the data needed in the first moments after the onset of an emergency. They describe the country or region’s situation before the emergency and form the baseline of our maps, which will be enriched with situational information as the emergency develops. One can mention for example administrative boundaries, geographic features such as rivers and lakes, population distribution and infrastructure (e.g., roads, airports, hospital).
Technology stack
All of these datasets are gathered mainly through APIs. APIs, or Application Programming Interfaces, are sets of rules that allow different software applications to communicate with each other. By interfacing with various APIs, the pipeline is able to fetch the latest data directly from the source. This ensures that the information used in analyses is both up-to-date and consistent, providing a reliable foundation for emergency response efforts.
Pipeline architecture showing the different steps, from data acquisition to storage.
The MapAction data pipeline is constructed using a combination of Python and Bash scripts. Python is a versatile programming language known for its readability and extensive libraries, making it ideal for data processing tasks. Bash scripts facilitate the automation of command-line operations in a Linux environment.
To ensure portability and consistency across different computing environments, the pipeline operates within a Linux Docker container. Docker is a platform that uses containers to package applications and their dependencies, allowing for seamless deployment across various systems .
Process orchestration is handled by Apache Airflow, an open-source workflow management platform. It enables the scheduling and monitoring of workflows, managing task dependencies, and ensuring that data processing steps occur in the correct order.
Next steps
The next phase for the MapAction Automated Data Pipeline involves rigorous validation of the results and testing during actual emergency responses. By integrating the pipeline into live operations, we can assess its effectiveness and make necessary adjustments. Initially, the tool will be made available for internal use within MapAction, allowing our GIS team to benefit from its capabilities while we continue to refine its functionality.
In the future, we aim to adopt an event-driven pipeline approach, enabling automatic initiation of data processing in response to specific triggers such as GDACS disaster alerts. Additionally, we plan to develop an interactive dashboard that allows for manual configuration of pipeline runs, giving users greater control over data collection parameters.
Ultimately, after thorough internal testing and refinement, we hope to make the pipeline available to the broader humanitarian community. By sharing this tool publicly, we aim to support other organisations in enhancing their emergency response efforts through improved data accessibility and efficiency.
MapAction’s work in humanitarian response is funded by USAID’s Bureau for Humanitarian Assistance (BHA) and the German Federal Foreign Office’s Programme for Humanitarian Assistance.