Macro Level: Time

A look at the whole dataset

The first result worth discussing is related to the application of the developed model to the whole database overall years.

When applying clustering to the capability matrix of the entire database, the natural relationships between every term can be observed. As described, the most related terms appear together in this clustering. For example, terms related to sugar, straw, waster or wood, show a high relationship in terms of usage. This serves as a validation and proof that in fact, the clustering makes sense. It is interesting to discuss two different phenomena that resulted from this analysis. First, most terms are clustered because of their scientific similarity, this happens particularly to feedstock terms. Secondly, some other terms such as outputs, are clustered in a not so linear form. For example, biogas, ethanol, and other types of fuels are also clustered. This second clustering can be due to a variety of factors, one possibility is the fact that they are researched using the same feedstocks or processing technologies. Another possibility is the fact that they are clustered for being intensely researched, and the “the goal” of most technological assets.

Correlation of Years

When observing the capability correlation matrix of the years, and particularly its clustered version, one cannot help but notice that as time goes on, years are increasingly related to each other. For example, the last 7 years (2010-2017) form a cluster where their correlation is of at least 70%. On the other hand, from 2004 to about 2009, years have on average a far inferior correlation between themselves. Finally, the years before 2004, make up the cluster of less related years. This might indicate a decrease in the experimentation in the field of biofuels research in the years of 2004-2009, and a consequent consolidation of the field in the modern years. This increase in consolidation can be related to factors such as: globalization, increase of knowledge sharing, the prevalence of information systems such as the internet, or even others.

Looking at the correlation of years with the year before, there appear to be years that somehow are almost unrelated to years before. From 1999 to 2001, the correlation seems to increase until it drops considerably in 2002, and then regularly increases. This seems to indicate that there are “gaps” or “breaks” in scientific research particularly apparent in the years of 2002 and 2006. These gaps might be related to a change in scientific direction, the appearance of a breakthrough technology, or more generally, something that leads researchers to stop using a certain set of term combinations and start using a different one. Moreover, this correlation behaviour might be connected to the fact that in earlier years, less data is available, and therefore sparser, leading to a lower correlation (see limitations).

Another interesting concept worth noting and that can be easily validated is related to the concept of first, second, and third generation biofuels (Saladini et al. 2016). First generation biofuels are mainly related to crops and feedstocks related to human consumption, second generation biofuels are related to non-human consumable feedstocks and third generation biofuels which are related to aquatic feedstock. The normalized evolution of terms is compatible with this concept. When looking at the evolution of algae, sugar and wood, one can notice that sugar has a considerable usage in early years, wood follows with a high frequency after 1995. Finally, in recent years, the prevalence of the term algae is clear.

In summary, there are several models and reasons that can explain the relationship between the years and the evolution of the research landscape. One should note that the terms in the framework are based in all years; therefore, it is normal that the matrix is more and more saturated as the years go by.

Comparison of Years

When comparing two years in the database and the main term pairs that make up their capability matrices. Some results are worth discussing.

In the case of the comparison of 2010 and 2017, the years have the same term pairs generally speaking. However, the percentages of these term pairs can differ greatly. This leads to the belief that the research ecosystem is more “distributed” than older years. This somehow confirms the considerations in the previous section.

Furthermore, it is also interesting to note some areas of research that have highly increased or decreased in frequency. This behaviour supports the fact that some areas of research lose or gain interest over time. The reasons behind the behaviour can range from a general loss of interest, commercialization, or general trend towards a particular research area. Anaerobic digestion for example sees an increase of about 1%, which can indicate a focus of the ecosystem on this particular processing technology. Although 1% might not sound like such an important increase, if the global panorama is considered, this has weight.

The comparison of two years and the calculation of the Pearson correlation index, also serves as a proxy for their similarity. This similarity (or lack of) can be due to a high number of small changes in term pair usage (likely), a number of large changes (less likely), or a combination of both (likely).


The contextualization part of this analysis is perhaps one of its most interesting aspects. This is because it tries to understand the relationship between certain external factors and the research ecosystem.

The adjusted price of barrel of oil is an indicator that was chosen mainly due to its reach as an external factor. The price of oil is known to influence a series of macro and micro economic indexes. When comparing it to the frequency of output terms over the years in the research ecosystem, some correlations are surprisingly high (butanol 85%, bioplastic 80%). A positive correlation means that the higher the price of oil, the bigger the number of produced records with that term. Several factors could explain this:

  • Oil becomes more valuable, and therefore biofuels become more valuable as a way of addressing competition. This could lead to increased research.

  • Oil becomes expensive, and companies look for alternate fuels, therefore increasing R&D focus on alternatives.

Butanol, biodiesel, and bioethanol are all used as alternative fuels, and all have a correlation of at least 70%. To understand more exactly the nature of this relationship, more research would need to be carried out, perhaps even a time series analysis.

The price of the kilo of sugar and its influence on the biofuel research ecosystem comes to evidence a more important series of facts, but also questions. One would naturally think that the price of sugar affects “sugar related” research. The analysis carried out confirms this: sugarcane and cellulosic sugars for instance, have an 80% correlation with the price of sugar. However, when trying to understand exactly why if the price of a good increases, the research using that same good also increases, some questions surface. One would hope that if the price of goods increase, access to them becomes more economically challenging, and therefore research using them would decrease, causing a negative correlation. An opposing view is that as these goods become more valuable, the interest in it from an R&D perspective becomes greater. For now, this research can confirm that a relationship between the price of goods and their research exists, but the nature of it remains to be determined.


When studying the system at a macro level, there are a few limitations that should be pointed out.

Firstly, the study is related to the entire system in a given year. This might not seem big, but in fact, the study is related to all of the patents and publications, from every organization, in every country in the world. One can therefore expect that a big part of the chronological behaviour is rather “smooth”. This is because it is very possible that a certain phenomenon self-corrects when looking at the whole world. The system per se can be categorized as saturated.

Secondly, the volume of data along the years is not regular. In fact, there is an explosion of the number of technological assets in the more recent years. Although this was addressed by normalizing, one cannot forget the fact that the quality of documentation and general access to information has improved in the last years, causing recent documentation to be far richer than previously . This might make the analysis rather unbalanced, chronologically.

Finally, the use of the Pearson correlation index can also be seen as a limitation. Although this indicator is mathematically accurate in describing the relationship between two functions, it can be scarce in providing more information about the exact nature of a relationship. When analyzing the influence of external factors such as the price of sugar, carrying a time series analysis would allow a better understanding of the nature of the behavior observed.