MapAction Project Ideas

As part of our Moonshot initiative, we are initially working to automate the production of our core reference maps for 20 priority countries. We aim to extend this automation framework to handle a greater number of countries over time. To achieve this, we are developing an automated framework to ingest, process, and visualise a variety of datasets for each of these countries. All of our potential ideas are in reference to this stream of work. Our ideas list, below, references two key areas of software development:

Automated map production through ‘MapChef’: GitHub

MapChef is our Python-based map automation tool. It relies on our existing map templates, Data Naming Convention, and “map recipe files” to create a subset of our core maps, in static PDF format. In the future we want MapChef to handle a greater number of products, not just PDF maps. It should also take advantage of our work on data quality to make better decisions about which data sets to use depending on the requirements defined in each product recipe.

Automated data acquisition and processing through our MVP pipeline framework: GitHub

We are developing a standardized framework for how we handle data acquisition and processing. This framework is currently implemented using Google Cloud Platform and Apache Airflow. Initially, this will be used to provide data to our automated mapping processes (through MapChef). As the capability grows we intend to use this systematic approach to identify data gaps at regional levels.

Projects appropriate for Google Summer of Code, 2021.

Administrative boundary data quality reports

Theme: Data quality

Background: Our data collection efforts (manual or automated) often leave us with multiple options for the same geographic feature. For example, we may find ourselves with administrative boundary datasets from OSM, GADM, and geoBoundaries. It is time consuming to compare each of these options manually. We need an automated way to assess the quality of various administrative boundary datasets, allowing us to make a well-informed selection about the best available option to use in our humanitarian mapping.

Desired outcomes: A Python module that generates a data quality report for one or more input administrative boundary datasets. This report should summarize the basic attributes and topological characteristics of admin boundary data. This report should be presented in a user-friendly way, allowing a MapAction team member to quickly make a well-informed decision about which data option is best suited for the task at hand. This report should be based on a well-defined definition of the data quality attributes relevant to administrative boundary data.

Stretch: Integration of the above Python module with our Pipeline MVP and Jira tasking system, allowing for (semi) automated decision-making.

Stretch: Expansion to other data types beyond administrative boundaries, such as road networks and/or health facilities.

Required skills: Python programming, knowledge of geospatial data quality and GIS

MapChef QGIS plugin

Theme: Automated mapping

Background: Our map automation tool, MapChef, is currently limited to using the Esri ArcMap API to produce maps. We would like to extend this by creating an equivalent plugin that uses the QGIS API to create equivalent maps. This would follow the same overall design and process as the current Esri plugin.

Desired outcomes: MapChef software able to output (a) QGIS projects files and (b) exported map files produced using QGIS, with styling according to MapAction conventions.

Stretch: integration with vector tile plugin described as a separate idea on this page.

Required skills: Python programming, knowledge of QGIS API, regular expressions, GIS (non essential)

MapChef vector tile plugin

Theme: Automated mapping

Context: As we expand our capacity to produce web mapping products, we have identified vector tiles as a key technology of interest that will allow us to produce fully customizable web maps. Following an initial proof of concept that we have developed, we would like to investigate how a pipeline to produce vector tiles could be integrated into our MapChef software.

Desired outcomes: MapChef software able to output vector tiles with basic reference data, with styling according to MapAction conventions.

Stretch: Simple user-facing client application to view vector tiles in the browser.

Stretch: Vector tile data storage and management solution.

Required skills: Python programming, data management, cloud hosting, data visualization/cartography

Extend Data Pipelines to cover more countries

Theme: Data engineering

Context: We recently implemented a series of data processing pipelines on Google Cloud using Python and Apache Airflow. MapAction’s use case is focused on obtaining data on a country-by-country basis. We are seeking to have defined three data acquisition and processing pipelines for 20 countries by May 2021 and 40 countries by December 2021. You would expand the coverage of countries by following the already defined data processing patterns for the following data types: SRTM elevation data, COD-AB country boundaries (hosted by HDX), and roads (from OpenStreetMap).

Desired outcomes: Manually adding configurations for 20 countries.

Desirable: Generalized approach to programmatically adding countries to the pipeline, rather than manually generating new config files.
Stretch: Re-organising the logic of the current DAGs. They are currently designed around the data type and MapAction’s use case revolves around grouping data by country.

Required skills: Python programming, cloud development, Apache Airflow, GIS (non essential)

Extend Data Pipelines to cover more data products

Theme: Data engineering

Context: Our data pipeline framework (described above) is currently implemented to process two minimum required datasets: country boundaries from HDX and roads from OSM. We would like to extend this by developing additional data processing pipelines to handle additional data products of value to our standard humanitarian mapping products.

Desired outcomes: Pipeline for a data product of operational value to MapAction that enables automated data acquisition and storage in our internal data management system (files stored on Google file stream). This pipeline should also perform basic transformations to ensure that the ingested data matches our standard naming conventions. This pipeline should also be integrated with our existing MVP framework based on Google Cloud and Airflow.

Stretch: Pipeline integrates basic data quality and/or schema checks.

Required skills: Python programming, cloud development, Apache Airflow, GIS (non essential)