Data

The data used for this project is open data produced by the World Health Organization, the European Environmental Agency, the World Bank and Eurostat.

The dataset used for the cancer mortality originates from the WHO Mortality dataset, a large interactive platform where data about mortality for all countries and years from different diseases and conditions are gathered. For the purpose of our project, it has been taken the dataset containing the aggregated death for trachea, bronchus and lung cancer. From the various mortality parameters, we chose to use the death rate per 100,000 inhabitants, in order not to be influenced by the size of the population of the individual nations, as would have been the case with the simple count of deaths.

For the pollution, it has been used the dataset on the air pollutant emissions retrieved by the Air pollutant emissions data viewer (Gothenburg Protocol, LRTAP Convention) 1990-2020 operated by the European Environment Agency. Here the yearly emissions of various pollutants since 1990 to 2020 are collected, divided by source of the pollutants. For the scope of our project, 8 pollutants related to lung diseases have been selected, with the emissions for the category “National total for the entire territory (based on fuel sold)”, which encompass the whole emissions for the country for the year. The emissions are provided in Gg, which corresponds to 1000 of tonnes. The emissions have been normalized with the area of the countries retrieved from the dataset of the World Bank.

For the land area of the countries, used to normalize the emissions, the data has been retrieved by the World Bank. The dataset provides the country's total area, excluding area under inland water bodies, national claims to continental shelf, and exclusive economic zones. We selected from the database of the World Bank only the data for the European countries considered in this project.

For the geometries of the countries, the Countries dataset from Eurostat, with the scale 1:10 million, has been used. Since the dataset contains the geometries of all the countries of the world, has been then cropped to our area of interest.

Methods

The first part of the realization was dedicated to adapting the data to our purpose. The two WHO and EEA datasets, which were in csv format, were cleaned and organized using RStudio with the tools provided by the tidyverse library. For the pollution dataset, the yearly emissions of the pollutant have been normalized with the area of the country, thus the pollution in our project is expressed in ton/km². Furthermore, it was divided in 8 individual dataset, one for each pollutant, so that it was possible to visualize each layer separately in ArcGis. All the pollutant datasets and the death rate dataset have been afterwards merged with the polygon layer of European countries. This made it possible to use the processed datasets for the creation of the map in ArcGISPro. For the visualization of the pollutants we chose to create for each one a choropleth map with an alteration of the blue color saturation. A neutral color was chosen that would not cause problems for people with color blindness and that was thematically in keeping with the air. In addition to this, the interest was more in the gradation change, as the higher the intensity of the color (in this case blue), the higher the concentration of the pollutant in the air. Conversely, the lower the concentration, the less intense the color in order to represent 'cleaner air'. The data has been classified with the Natural Breaks in 5 classes.

For the cancer mortality rate data, a visualization by proportional symbols was chosen, with the circles as symbols. In this way, the higher the rate, the larger the radius of the circle. The color red was selected as it is used to show an important and rather negative fact, enabling it to have an impact on the audience. In addition, the color does not give difficulty in case of color blindness as it stands in contrast to the blue color map. It was also chosen to apply a certain transparency to the circles in the case of overlapping and to display the underlying boundaries. Finally, the settings were adapted to the time sequence, i.e. with a range from 1990 to 2020, and a distance of 1 year per disposition.

In a final step, the data were shared on ArcGIS Online where the latest formal changes (Legend, Pop-Ups, etc.) could be applied. It was then possible to create the WebMap on WebApp Builder and arrange the various elements according to our visualization preferences. Unfortunately, in the WebApp Builder it appears to not be possible to have radio buttons for the choice of layers, which would have been optimal for the display of the pollutant layers, since only one of the pollutant at time can be visualized with the choropleth map. We therefore opted to make a disclaimer in the map description, advising the user to only check one pollutant layer at time. Secondly, it was not many options to change and adapt the Time Widget to our needs, thus with the animation resulting to be slow and 'bouncy' in the chronic arrangement of data. The alternative attempt to create a scrolling bar instead of a time sequence failed and was resolved with a verbal explanation to the audience on the best use of the gadget, namely to prefer the use the arrows to move at their pace through the years.

To ensure a full understanding and quick learning of the project, accompanying graphs were created in which some impactful and more panoramic results of the topic are arranged.