Environmental Performance Index 2008 [BETA]

Methodology

We believe that transparency is essential for good analysis, and aids concrete policy targets. This appendix provides a detailed description of the steps included in calculating the 2008 EPI and the statistical techniques used. The issues addressed in the following sections mirror those commonly encountered in the computation of composite indices: indicator and country selection, missing data treatment, standardization, aggregation and weighting methodologies, as well as performance testing (OECD, 2003).

Country Selection Criteria
Ideally, the EPI should include all of the world’s countries and territories. However, persistent data gaps require that we balance geographical coverage against the validity and accuracy of available data. Wherever possible, and in line with our goal of providing a reliable and accurate picture of environmental performance of every country in the set, the 2008 EPI contains only countries with complete data coverage across all indicators and policy categories, with the following exceptions:
  • Inclusion in the Fisheries indicator requires that countries have at least one of the two constituent indicators (Trawling Intensity and Marine Trophic Index).
  • Inclusion in the Productive Natural Resource policy category requires countries to have at least two of the three constituent indicators (Forestry, Fishery, Agriculture). First, for some indicators – such as those in the Productive Natural Resources category, data availability depends in part on a country’s geographical location. Countries with no forests, no active fishing fleets and industries and no land used in agriculture may be missing some indicators associated with those activities but should be, and are, still included in the EPI.
  • We imputed values for some countries for three indicators in the Environmental Health policy category: Drinking Water, Adequate Sanitation and Environmental Burden of Disease; Water Quality in the Water category; Agriculture Subsidies in the Productive Natural Resources category; as well as the indicators in the Climate Change category. In the case of the Drinking Water and Adequate Sanitation data there is a very high correlation between the indicator data and a rich body of literature and practitioners’ knowledge on the relationships between these measures and development. This knowledge base permits us to use available data to impute any missing values. The table below includes the complete list of indicators for which date were either averaged or imputed:
Indicator Name Indicator Code Missing Data Method
Environmental Burden of Disease DALY Imputation based on income per capita T
Adequate Sanitation ACSAT Imputation based on income per capita (log) and WATSUPa
Drinking Water WATSUP Imputation based on income per capita (log)
Water Quality WATQI Imputation based on regional average and non-reporting penalties
Critical Habitat Protection AZE averaged around for countries with no AZE sites
Growing Stock Change FORGRO Imputation based on percentage change in forest cover 2000-2005.
Marine Protected Areas MPAEEZ averaged around for countries with no EEZ
Irrigation Stress IRRSTR averaged around for countries with no agricultural land
Intensive Cropland AGINT averaged around for countries with no agricultural land
Greenhouse Gas Emissions Per Capita GHGCAP GHG emission imputation based on CO2 (CDIAC); Land emission imputation based on regional average of emissions per square kilometer
Agricultural Subsidies AGSUB Imputations based on 2006 EPI’s AGSUB proximity-to-target score. Missing 2008 AGSUB values were given scores that correspond to equivalent proximity-to-target scores
Emissions per Kilowatt Hour of Energy Produced CO2KWH Imputations based on renewable energy as a percentage of all energy production.
Industrial Carbon Intensity CO2IND Imputations CO2 emissions per GDP
Target Selection

An additional challenge arises from the difficulty of determining clear performance targets for some of the indicators. For instance, in Europe, sulfur dioxide emission targets are based on sophisticated monitoring and modeling exercises that permit detailed, differentiated targets that take into account differences in emission trajectories, deposition sensitivities, and mitigation costs. There is no corresponding information base for assigning differential targets on a global basis, nor has there been any similar negotiating process to lend such targets legitimacy and authority. Therefore, our global target on sulfur dioxide (reduction to zero) is cruder than we would expect a fully mature global sulfur dioxide policy regime to adopt. Nonetheless, we consider such crude targets useful for the purpose of broad comparison among countries, both within single issues and collectively across multiple issues.

Missing Data

Despite improvements, data gaps remain a very serious obstacle to a more refined EPI and to data-driven policymaking more generally. Many countries, particularly in the developing world, lack data on a number of critical indicators. More generally, persistent data gaps, lack of time series data, or incomparability of data across countries means that several important policy challenges cannot be addressed adequately at present. For instance, air quality indicators based on ground-monitoring are unavailable for many developing countries and are further limited by weak data comparability even in developed countries, which combined with the dependency of conditions on local environmental and/or socio-economic characteristics severely reduces possibilities to impute data from one location to another.

Missing data is a major source of uncertainty in index construction. Although increasingly sophisticated statistical methods exist for imputing missing data, they entail assumptions regarding the causes for the missing values. In addition, application of these methods requires knowledge and careful consideration of the strengths and weaknesses of various techniques in light of the available data. To continue the air pollution example, such data are highly dependent on spatial and temporal conditions, which complicate the development of imputation models that are applicable to different regions and countries. In addition, the essence of the EPI—as a gauge of actual environmental results—requires particular confidence that any numbers imputed reflect ground-level circumstances and outcomes. We have used well-recognized imputation models to impute missing data for a number of indicators, as noted above.

Still, the lack of data limits the comprehensiveness of the EPI. In the air pollution context, pollutants such as lead, ultra-fine particulate matter (PM2.5), and volatile organic compounds (VOCs) do not have sufficient ground observations available and are not updated on a sufficiently frequent basis to permit robust performance metrics. Although satellite-based observation of air pollutants is advancing rapidly and provides more reliable estimates to fill in the gaps, availability and use of these technologies is still constrained. The result of these data gaps and inconsistencies is that only measures of regional ozone and sulfur dioxide emissions are included in the 2008 EPI to represent the ecological dimension of air pollution. The lack of adequate data indicates the need for increased national and international efforts to improve the same, specifically regarding better air quality measures.

More work remains to be done to both address the lack of available information on environmental policy issues and reduce serious shortcomings in the quality, geographical coverage, or timeliness of the available data. Since the publication of the Pilot 2006 EPI, we have been able to compile data for the crucial issues of biodiversity and conservation measures, fisheries data, and climate challenge. On the other hand, we are still calling on organizations and governmental bodies involved in environmental monitoring and data collection to invest in initiatives to assemble measures in particular within the following fields:

  • Concentrations of additional criteria air pollutants
  • Exposure to toxic chemicals
  • Blood lead levels
  • Soil degradation
  • Sector-specific greenhouse gas emissions
  • Pesticide application
  • Effectiveness of protected area management
  • Deposition of sulfur dioxide compared to critical loads

We hope that increased initiative will make it possible to fill these data gaps in the future.

Calculation of the EPI and Policy Category Sub-Indices
Indicator Transformation for Cross-Country Comparisons

Environmental data are measured on various scales and require standardization to permit cross-country comparisons. Standardization also ensures that no indicator dominates the aggregated EPI and policy indices, and conveys information about a country’s environmental performance in an easy-to-understand yet meaningful way using a scale that quickly reveals a country’s position vis-à-vis other countries as well as with respect to desirable performance outcomes. For these reasons, the 2008 EPI– as in the Pilot 2006 EPI – uses a proximity-to-target approach that evaluates how close a country is to a desirable performance target for each of the 25 indicators.

Initially, we examined the distribution of each indicator to identify whether extreme values skew the aggregations of some indicators. Our analysis concluded that the extreme values are more indicative of being “outliers”( values numerically much larger or smaller than the rest of the distribution) than of being the realizations of a skewed distribution. Accordingly we adjusted outliers using a recognized statistical technique called winsorization. Winsorization essentially involves setting values falling below a tail percentile of the data distribution (e.g., the 2.5th percentile) equal to that percentile value and observations above the corresponding upper tail percentile (i.e., the 97.5th percentile) equal to that percentile value. Our decision rule for choosing the percentiles beyond which observations would be winsorized was as follows: if the ratio of the 97.5 percentile value to the 95 percentile value (or the 5.0 percentile value to the 2.5 percentile value) was greater than 5, indicating a large spread between them, we winsorized at the 5.0 or 95.0 level.

Following the adjustment of outliers and extremely skewed indicators, the proximity to target values are calculated as follows:

o For indicators where large observations indicate better performance: 100 – [(target value – winsorized value) x 100 / (target value – minimum winsorized value)]

o For indicators where small observations indicate better performance: 100 – [(winsorized value – target value) x 100 / (maximum winsorized value – target value)]

This calculation is based on how far each country is from attaining the target score for each indicator and ensures comparability across the 25 indicators. In addition to its simplicity, this transformation also allows the interpretation of a country’s performance as the shortfall from achieving the target expressed in percent. For instance, a country’s score of 80 for the Drinking Water indicator means that it is 20% short of meeting the target; in this case 20% of the population does not have access to drinking water. It should be noted, however, that the standardization technique described here does not eliminate differential spreads in the data among the indicators, i.e., the variance of each indicator is not standardized and thus indicators still contribute somewhat differently to the aggregated policy and EPI scores.

For the majority of indicators, the choice of these targets is based on generally accepted sustainability criteria, international treaties, scientific and expert judgments, but in some cases, such as sulfur dioxide emissions, no such targets are available due to lack of international agreement and/or the significant influence of local ecological and other conditions. In such instances, the specification of a performance target had to be based on pragmatic realities rather than ideal goals.

We decided not to give countries exceeding specified targets additional “performance credits”, rather we have set their score to the target. This form of “target winsorization” is done to reduce the ability of countries to use above-target performance in one area to make up for poor performance on other indicators. Since the majority of indicator targets also reflect sustainability criteria, it could even be argued that overachievement is an inefficient deployment of a country’s resources. In some cases, moreover, above-target results may be a function of data anomalies or reporting errors.

Data Quality and Coverage

Despite the continued problem of data gaps and problems in the comparability, spatial, and temporal coverage of relevant environmental data, the 2008 EPI is an important step forward in our ability to measure country-level, policy-driven progress toward identified environmental goals.

More work remains to be done to both address the lack of available information on environmental policy issues and reduce serious shortcomings in the quality, geographical coverage, or timeliness of the available data. Since the publication of the Pilot 2006 EPI, we have been able to compile data for these important issues: biodiversity and conservation measures, fisheries data, and climate challenge. On the other hand, we are still calling on organizations and governmental bodies involved in environmental monitoring and data collection to invest in initiatives to assemble needed metrics and data.

Hopefully, continued efforts will make it possible to fill these data gaps in the future.

Of further relevance in the context of data coverage is consideration of how environmental pollution and resource use affect countries at different stages of economic development. The cluster analysis and presentation of EPI results for various “country peer groups” highlights that different EPI indicators are of high importance to various country groupings. While this is an important issue for weighting the indicators, it also demonstrates that indicator selection for a global index is a difficult task. While our search for additional and better data is ongoing, this EPI contains 25 indicators for 149 countries, which we believe reflect the most important and best available measures to track and assess environmental performance. Aside from policy relevance, only datasets with sufficient coverage, data “freshness”, and methodological consistency were chosen.

Cluster Analysis

Cluster analysis refers to a rich suite of statistical classification methods used to determine similarities (or dissimilarities) of objects in large datasets. We use this technique to identify groupings of relevant peer countries. Within each peer group, countries have a better basis for benchmarking their environmental performance because the group members are similar with respect to the data used to classify them, so the technique provides a good starting point in the search for best practices.

Cluster Analysis Techniques

There is no best method for conducting cluster analysis and the results of such analyses are subject to interpretation. We applied two different algorithms to explore the data structure using a non-parametric, distance-based agglomerative clustering algorithm known as Ward’s method.

Agglomerative clustering begins with as many individual clusters as there are data points (in this case, countries). It then successively combines countries that are most similar to each other with respect to a quantitative similarity measure until all countries are joined in a single cluster.

The similarity measure decreases during this process, while the within-cluster dissimilarity increases as more and more countries are added. The tradeoff lies therefore in choosing a similarity measure, or “pruning value”, that yields both a relatively small number of clusters and a high level of similarity. We determined that seven clusters yield a reasonable division between the countries.

After determining the number of country clusters, we use the k means clustering method developed by Hartigan and Wong (Hartigan and Wong 1979) to determine cluster membership. K means is a non-hierarchical method that requires that the number of clusters, k, be specified up-front (hence the preliminary use of Ward’s method) and then iteratively finds the disjoint partition of the objects into k homogenous groups such that the sum of squares within the clusters is minimized. As long as the data are not skewed each variable receive approximately the same weight in the cluster. We thus used the proximity-to-target indicators, divided by the square roots of the weights allocated to them in the 2008 EPI, so that the sum-of-squares (variance-like) calculations of k-means would be on the scale of these weights. We also centered the indicators at 0, so positive or negative values in the clustering summary of the group centers indicate better or worse than average performance. The k-means clustering algorithm coupled with Hartigan’s ‘rule of thumb’ indicates 5-7 clusters. After reviewing cluster membership when 5, 6, or 7 clusters are chosen, we selected the 7 cluster solution as it was most interpretable from an environmental performance and socio-economic development perspective.

Specific Observations

Several interesting patterns became apparent during the cluster analysis process. Firstly, there is a strong association between a country’s EPI score and its Ecoystem Vitality score, and the former cannot be lower than the latter. The same rule does not hold true with the EPI and Environmental Health scores, where an association exists, but top performers show a tail.

It also became apparent that there are some trends in the data at the indicator level. Six countries received scores that are far lower than the median for Fisheries, while there are many countries which receive the top score for Forestry. This pattern naturally lends itself towards two clusters: those countries at the top, and those who are not. Almost all countries score very well on the Air Quality (relating to Environmental Health) indicator, but a country’s score for biodiversity shows very low correlation with it’s score on any other indicator.

Download: EPI 2008 References (.pdf)


Comments
Theo Dijkstra (Mar 27, 2008): The intensity is measured relative to GDP. Why not per capita?
Mark Müser (May 20, 2008): I think in the formula for the calculation of the proximity to target values is a mistake. It should say (for example for indicators where large observations indicate better performance): 100 – [(target value – winsorized value) x 100 / (target value – minimum winsorized value)] So there is one bracket too much in your formula. And there is also a mistake in this formula in the downloadable Main Report — Text Version(.pdf) on page 39. On page 39 there is not a bracket to much but one bracket is not at the correct position.
alam (Sep 08, 2008): It isa good effort but it is not wise to calculate a value for missing data and also listing a country without having a data/reliable data source. This gives a misleading information about a country’s EPI index. Because people used to compare the relative position with this misleading information (without thinking about the limitation u have)which challenges the reliability of your index. Relative position of some countries like Netherlands, South Africa,Taiwan,USA etc. are not seems to be acceptable. Without having a good data, it is not wise to include a country in the list. Moreover, this index somewhat biased/overweighed by per capita income/GDP. This is not logical because it is generally accepted that the more the income of a country the more is the natural resources used by the country which has a adverse enviromental impact. Also these countries have shifted their environmental burden to other countries in a globalised economy. Countries with higher income group does not necesarily always go in line with the better Environmental performance.
Charts on this website require the Flash plugin, version 8 or higher. Free download (easy to install).