Complementary Analysis

Publications and Patents

When discussing the patents and publications analysis, the analysis tried to understand if they are unbalanced or if there is a presence of out of the ordinary behaviour.

Normally, one would expect scientific publications to lead the way in quantity: this is confirmed by the data. In fact, there is sharp rise in the number of scientific publications related to biofuels from 2006 to 2016. But contrary to what one would expect, patents are at the origin of this explosion, in fact, between 2006 and 2011, patents were more important. What appears to happen is that patents are guiding the direction of research, when their number rises; publications tend to rise, with some delay. When patents fall, publications fall sharply. This could be explained by the fact that biofuels is an “industry first” technology and that it is also becoming a more mature technology which is way past its exploration phase. This pattern repeats when looking at particular feedstock terms, where a small change in the rate of patenting, seems to disturb the rate of research on that particular term, with a certain delay.

When looking at the rate of patenting and the rate of publication of every term, one can notice the terms that are more mature and the ones that are still in the research/publication stage. Generally speaking, there seems to be a balance between patenting and publication of every term. One would expect some terms to be lost in the research phase and not adopted by patenting, but the imbalance of some terms (such anaerobic digestion) might be indicative that these are still in its research phase and will be adopted later.

Some of the limitations of this complementary study include the fact that an analysis on the global ecosystem is being made, which is quite saturated on that scale. Moreover, in order to really understand the relationship between patenting and publications, a time series analysis would be perhaps more telling. Finally, the fact that terms instead of term-pairs were used for this analysis can also be seen as a limitation, since the observation of term-pairs tends to add more granularity. This of course would have to be done carefully due to the very high number of term pairs.

Alternative Methods

One other part of the complementary analysis was the application of some unsupervised learning techniques to the data, in order to find some patterns.

However, both the application of the principal component analysis and the t-sne algorithm showed that these techniques are not as directly applicable and explanatory as the system perspective. This is because the clustering of different countries was not directly evident from the application of the transformations.

An obvious limitation of this study is the study level for this part of the analysis. A very large number of other techniques for clustering, association mining, and density estimation exist for inferring a structure to data. However, the application of these techniques was deemed to be out of the scope of this thesis.

Clustering Quality

When comparing the quality of the clusters made with term frequency and term-pair frequency, the fact that term pair produces a much cleaner and reasonable clustering is quite obvious. In fact, just as predicted by the AMICa pathfinder briefing, a term per se is much less explanatory than the several associations to it. For example if only observing the term “anaerobic digestion”, all of the countries that use this processing technology would be deemed similar. But if we observe the term pairs “sugar/anaerobic digestion” and “wood/anaerobic digestion”, then the clustering will consider these as different features and cluster the countries that use each of them separately, therefore providing much more granularity to the analysis.

Of course, judging the clustering quality only conceptually and visually can be seen as a limitation. Techniques such as the Rand index, the Jaccard similarity, or the entropy of partitions exist to provide context on the clustering made. Although the application of these indicators is not in the range of this project , these could be applied with relative simplicity.