Meso Level: Technological Capabilities of Nations

Jupyter Notebook link

In the second part of the fourth section, the focus will lie in the study of the technological capabilities of countries. The methods used in this part of the analysis are closely related to the methodologies presented in the third section and the analyses made in the first part of this section, except for some small differences and additional analysis.

1.Characterisation of Countries

Capability Matrices

The first result produced relied on the representation of the biofuel research ecosystem of a country as a capability matrix. This capability matrix is the result of the same application of the term-pair methodology as previously presented at the macro level, but instead of filtering the documentation by year, the documentation was filtered by its location. For example, when creating the capability matrix for Denmark, the term-pair matrix is related to technological assets (patents, publications, projects) located in Denmark or owned (or even co-owned) by Danish organizations. To introduce this concept, taking Sweden and Denmark as examples, the normalized capability matrix for each one of these institutions was produced. In the following figure and table, a visualization of both matrices side by side, as well as some indicative properties is shown.

Matrix Properties

Denmark

Sweden

Dimensions

352x352

352x352

Mean

1.71e-05

1.71e-05

Standard Deviation

3.95e-04

2.80e-04

Maximum

3.90e-02

3.06e-02

Minimum

0.00

0.00

Symmetry

True

True

Normalized Capability Matrices of Denmark and Sweden

When looking at the properties of both matrixes, some observations can be made:

  • Both matrices are symmetrical and equal in dimensions which is expected given the same dictionary of biofuel related terms.

  • The maximum, minimum and mean values of both matrices are generally similar.

  • The standard deviation of the capability matrix of Denmark is 40% higher which would mean that Denmark has a wider usage of different term pairs. On the other hand, Sweden’s capabilities are more “focused”.

Capability Lists

Following the same approach as the macro analysis, a capability matrix can be transformed into a capability list by taking its upper triangle and adding each entry to a vector. In this level of the analysis, and as a proof of concept, the capability list of the United States of America and the capability list of China are presented side by side. However, due to the large number of entries in these lists (58482) the visualization of the differences becomes rather difficult. Consequently, in the final part of this subsection, this concept will be revisited in a more detailed manner.

Country capability lists of China and the USA.

2.Country Correlation Matrix and Profiles

Country Correlation Matrices

With the goal of applying the same engineering systems approach to the meso level, as was applied to the macro level, the Pearson correlation index was used as an indicator of the similarity between the capability lists of two countries. For example, the Pearson correlation index between the US and China lists has the value of approximately 0.65, or 65%. This could mean that the biofuel research between these two countries is 65% similar. To visualize this, the country correlation matrix was created.

After creating this matrix, and just like it was done for each year, a hierarchical clustering algorithm was applied to the matrix as a way of possibly identifying clusters of countries that are more similar between themselves. Moreover, this clustering technique also produced a dendogram as a way of quickly identifying the countries that are more related to another. For example, if this dendrogram was to be cut in the level n=2, forming clusters of two countries, Denmark would be connected to Portugal, and the United States would be clustered with Taiwan. Interestingly, one can observe three main cluster areas in the ordered matrix:

  • On the top left side of the matrix, an area of highly related countries that range from France to Serbia. (see axis of second figure)

  • On the bottom right side of the matrix, an area of related countries, which on average are less related than the top left but separated. (Belgium, Hong Kong, Hungary Tunisia...)

  • In the middle, a cross like area of countries which are not particularly related to each other or any other country. (El Salvador, UAE, Scotland...)

Country Correlation Profiles

While clustering is an interesting way of visualizing the general trends that would possibly occur between countries, it does not explicitly show what countries are more related to each other. To visualize this, country profiles were created. A country profile is built by “cutting” the capability matrix for a particular row (country) and ordering the results. In the figure below, the country profile of Denmark is presented. On the y axis, the Pearson correlation index (x100) is used as a measure of similarity between countries.

This graph is a simple way of quickly visualizing the most similar countries to Denmark in terms of biofuel research. For example, the most related country to Denmark is Spain, with an index of about 60%, following Portugal with an index of ~58%, etc. Interestingly, the most similar countries are not necessarily close in geography to Denmark, but close to themselves. Sweden for example is related to Denmark by a factor of 50%, and Norway by only ~30%. Following this method we could say that in terms of biofuel-related capability matrices Norway is as similar to Denmark as Colombia is.

Country profile of Denmark.
Country profiles of Denmark, Portugal and the United Kingdom

3.Contextual Relationships

GDP per capita

Using the world bank as a source of data to get the values in $US of the Growth Domestic Product per capita, the GDP per capita difference for every country pair was calculated. After calculating this, the goal is to understand if the GDP per capita of two countries is telling of the technological similarity of those two countries. In the first plot produced, presented below, each data point is a pair of countries. In the x axis, the Pearson correlation (0-1) between the country pairs, and in the y axis, the absolute GDP per capita difference of those same country pairs. For readability purposes, if two countries have a capability similarity of less than 10%, or 0.1, this pair would be excluded from the graph.

When observing the graph, one can notice that most country pairs have less than a 40% capability correlation and less than 40000$US GDP per capita difference. On the other hand, when looking at the dashed guidelines in the graph, the further from the origin of the graph a guideline is, the less country pairs appear. Moreover, generally, countries that are more related (higher capability correlation), have a more similar GDP per capita. For example, Brazil and Zimbabwe, have a capability correlation of 88.60% (0.88), and a GDP per capita difference of 7620.87$US, which is rather low.

Capability correlation and GDP per capita difference of country pairs.

However, the graph above loses an important dimension: it is hard to distinguish country pairs just from the GDP per capita difference. For example, let us take as an example the country pairs Sweden-Singapore, and Romania-Brazil. These two country pairs have a low GDP per capita difference; however, the first pair is made of economically developed countries, and the second, generally underdeveloped countries. The graph above treats them equally.

In order to add an extra dimension to this visualization, the graph below was produced. Here, one can also see the average GDP per capita of each country pair as a color scale. For instants, Sweden-Singapore is light blue, and Romania-Brazil is red.

Capability correlation and GDP per capita difference of country pairs. Interactive version: https://plot.ly/~duarteocarmo/32/

Collaboration

The second contextualization is not necessarily from an external data source, instead, it was obtained from the database itself. By querying the database, it was possible to retrieve, for each country pair, the number of technological assets where these countries collaborated.

By taking the number of shared assets between a country pair and the capability correlation between that same country pair, the graph below was produced. In it, 4 different areas can be observed (in italic, example pairs):

Low number of shared assets

High number of shared assets

High capability correlation

Similar and not collaborating Argentina - Iran Belgium - China Brazil - Costa Rica Canada - Denmark

Similar and collaborating Austria - Germany Belgium - Germany Denmark - Germany Finland - Germany

Low capability correlation

Different and not collaborating Australia - Argentina France - Servia Indonesia - Malaysia Brazil - Portugal

Different and collaborating Austria - France Belgium - France Denmark - Netherlands El Salvador - Germany

On one hand, most country-pairs are located in the “Different and not collaborating” quadrant. On the other hand, there is a high number of country pairs that are similar in terms of capability but are not collaborating.

Capability correlation and collaboration between country pairs.

When looking at the above graph, one can consider the number of shared assets indicator as an unfair index. This because not all countries possess the same number of assets. For example, the US has an extremely high number of documents, while other countries such as Costa Rica or Lebanon have a very low number of documents. For this reason, a new index, the normalized number of shared assets was created, as a way of valuing collaborations as a percentage of total documents produced by the country pair, its definition follows:

  • Old collaboration definition: Country i and country j have z assets that have both their name as location.

  • New normalized collaboration definition: normalized collaboration = (number of shared assets between country i and j)/(number of total possible collaborations between i and j)

For example, for the country pair Portugal-Denmark:

  • Number of assets Denmark: 351

  • Number of assets Portugal: 180

  • Number of shared assets: 25.0

  • Number of normalized shared assets: 0.13 (=25/180)

Below, the same graph, but with the normalized shared assets between each country pair is presented. One can notice that there is less saturation generally, and country pairs are more distributed. Moreover, some outliers appear such as France-Lebanon.

4.Comparing Countries

Coming back to the more general analysis, in the same way as two years were compared in terms of capability, two countries will now be compared in terms of term pairs usage. It is worth noting that this approach is simply a deep dive into the capability matrices of two different countries and looking at the most common term pairs in each of them.

As an example, the countries Brazil and Denmark will be compared, their capability correlation is around 30%. The first result is the top term pairs for each of these countries presented in two tables side by side. One can note that in the top term pairs of Brazil, there is a high number of term pairs related to sugar, sugarcane and cellulose. One the other hand, in the Denmark table, there is more stress on processing technologies (digestion, fermentation, hydrolysis), and outputs.

Top terms for Denmark:

First Term

Second Term

Documents

Percentage

anaerobic digestion

biogas

31

0.088319

ethanol

fermentation

26

0.074074

ethanol

hydrolysis

23

0.065527

ethanol

straw

14

0.039886

bioethanol

fermentation

14

0.039886

yeast

fermentation

12

0.034188

ethanol

enzymatic hydrolysis

12

0.034188

ethanol

cellulosic ethanol

12

0.034188

hydrolysis

bioethanol

12

0.034188

fermentation

cellulosic ethanol

11

0.031339

Top terms for Brazil:

First Term

Second Term

Documents

Percentage

sugarcane

sugar

416

0.525917

ethanol

fermentation

401

0.506953

fermentation

sugar

210

0.265487

ethanol

cellulosic ethanol

208

0.262958

ethanol

sugar

207

0.261694

ethanol

advanced biofuel

200

0.252845

advanced biofuel

cellulosic ethanol

200

0.252845

fermentation

sugarcane

198

0.250316

ethanol

sugarcane

195

0.246523

ethanol

hydrolysis

42

0.053097

Similarly to what was done with the macro level analysis, the table of the most important term-pair usage differences was produced. One can note a high number of term pairs that are not used at all by Denmark, and used in Brazil: “sugar-sugarcane”, “advanced biofuel-cellulosic ethanol”, “sugarcane-ethanol”. On the other hand, there is lower number of terms that are only used in Denmark (“straw-hydrolysis”). Moreover, feedstocks and related term pairs are common in this table, with terms such as sugar, sugarcane, or straw, being divisive between countries.

Top term pairs usage differences in Denmark and Brazil:

First Term

Second Term

Denmark

Brazil

Difference

sugarcane

sugar

0.000000

0.525917

0.525917

ethanol

fermentation

0.074074

0.506953

0.432879

advanced biofuel

cellulosic ethanol

0.000000

0.252845

0.252845

ethanol

advanced biofuel

0.000000

0.252845

0.252845

fermentation

sugar

0.014245

0.265487

0.251242

fermentation

sugarcane

0.000000

0.250316

0.250316

ethanol

sugar

0.014245

0.261694

0.247449

ethanol

sugarcane

0.000000

0.246523

0.246523

ethanol

cellulosic ethanol

0.034188

0.262958

0.228770

anaerobic digestion

biogas

0.088319

0.015171

0.073148

ethanol

straw

0.039886

0.002528

0.037358

straw

hydrolysis

0.031339

0.000000

0.031339

fermentation

mixed biomass

0.025641

0.000000

0.025641

hydrolysis

bioethanol

0.034188

0.010114

0.024074

straw

wheat

0.022792

0.000000

0.022792

biogas

waste

0.025641

0.003793

0.021848

biodiesel

transesterification

0.000000

0.020228

0.020228

straw

fermentation

0.019943

0.000000

0.019943

vegetable oil

transesterification

0.000000

0.018963

0.018963

fermentation

cellulosic ethanol

0.031339

0.012642

0.018697

5.Country Spectrums

Representing Country Spectrums

As a way of diving deeper into the country capability spectrums, understanding their composition, and making the analogy between term pairs and amino-acid pairs in DNA representations, in the following section the country spectrum concept was further developed.

Instead of focusing in the frequency of the appearance of a certain term pair, let us focus on whether a term-pair appears or not in the capability list of a country or not.

Below, for 7 countries, and the first 45 term pairs of the capability spectrum are represented. Even though this is a very small part of the spectrum (<1%), one can already see some term pairs that appear in several countries. “Natural Gas / Anaerobic Digestion” for instance, appears in Finland and Denmark. Moreover, there are a wide range of terms that only appear in one country. Such as terms related to “animal fats”, in the case of Spain.

Country capability spectrum (first 45 term pairs) for a set of 7 countries.

Generalizing this capability spectrum concept to all of the countries all of the terms pairs, is a good way of visualizing the biofuels capability “DNA” of all of them. However, in order to improve the quality of this visualization, two adjustments were made:

  • The order of the countries in the left hand side was adjusted to reflect the result of the clustering in the country correlation matrix.

  • Only term pairs that were used by at least 2 countries were represented. This allowed the reduction of the original size of the capability spectrum from 58482 values, to 6236 values.

Country capability spectrum for all countries (only term pairs that appear in at least 2 countries).

The uniqueness of countries

Taking the capability spectrum of a country as a starting point, the next and final step of the analysis seeks to understand how unique each country is in terms of usage of terms pairs. Denmark, for example, in its capability spectrum, uses a total of 256 different term pairs. Of these 256 term pairs, there are a total of 21 that are only used by documents located in Denmark:

Term pairs unique to Denmark

various grasses/straw various grasses/waste various grasses/garden waste various grasses/ethanol various grasses/enzymatic hydrolysis various grasses/hydrolysis industrial waste/gas cleaning sewage/gas cleaning mixed biomass/biogas mixed biomass/cellulosic ethanol mixed biomass/fermentation straw/garden waste grass/garden wasterapeseed oil/solventsgarden waste/ethanolgarden waste/enzymatic hydrolysisgarden waste/hydrolysisrapeseed/solventsbiogas/cellulosic ethanol

Taking this approach and applying it to all of the countries in the database, a uniqueness index was developed. The uniqueness index of a country is the ratio between the total number of term pairs used by a country and the number of term pairs that are unique to that country. In the case of Denmark, this value would be equal to 21/256 = 0.082.

With this approach, a table of the top 20 most unique countries was created. In this table, presented below, one can see that the most unique country is the US, with an index of almost 0.50. This means that half the term pairs used by the US are only used by the US (!). The rest of the countries in the ranking have a relatively low number of term pairs, Lebanon, for example, with only 5 term pairs, of which 1 is unique. The top 20 most unique countries have either a very large number of term pairs or a very low number of term pairs.

Uniqueness ranking:

Country

Uniqueness

Unique Pairs

Total Pairs

United States of America

0.477474

2215

4639

Ukraine

0.263158

5

19

Indonesia

0.210526

12

57

Lebanon

0.200000

1

5

Bangladesh

0.200000

7

35

Cyprus

0.157895

3

19

Sweden

0.156250

70

448

Austria

0.150685

11

73

European Patent Office

0.140145

155

1106

El Salvador

0.132530

11

83

Australia

0.132296

34

257

Uganda

0.130435

3

23

South Africa

0.125749

21

167

Hong Kong

0.125000

3

24

Turkey

0.120773

25

207

People’s Republic of China

0.113377

139

1226

Thailand

0.111111

9

81

Brazil

0.110672

56

506

United Kingdom

0.106870

70

655

Canada

0.106061

84

792