Machine learning to optimize biofuel production

Left: MIT D-Lab Research Scientist Dan Sweeney. Right: Johlesa Orm '26. Photo: MIT D-Lab
Left: MIT D-Lab Research Scientist Dan Sweeney. Right: Johlesa Orm '26. Photo: MIT D-Lab
MIT D-Lab


Starting spring of 2023, I had the chance to join MIT D-Lab’s Biomass Fuel and Cookstoves group to research biochar production optimization using machine learning. It has been a great privilege working on this project with my PI, D-Lab Research Scientist Daniel Sweeney, and direct supervisor, Priyabrata Pradhan, a D-Lab postdoc and Fulbright Fellow. I’ve gained a tremendous amount of machine learning experience and have learned so much about the charcoal-making process. This summer, I had the chance to participate in the MIT Energy Initiative (MITEI) summer program and learned more about climate and energy accessibility, and the importance of work like ours in bridging the energy gap.

Seeking new insights on the relationship between feedstock and temperature for "green" charcoal production

Many people around the world still use charcoal because of its accessibility and its ability to enhance certain cuisines. However, charcoal is traditionally made with wood, and when many people depend on wood-based charcoal, it leads to issues like deforestation. The good news is that charcoal can be made from pretty much anything that has carbon in it. At D-Lab, we’re focusing on transforming biomass, or materials like agricultural waste, into green charcoal.

To make green charcoal from a non-traditional material like agricultural residues, we need to understand the relationships between the material and heating process and their effect on charcoal output. A lot of chemical reactions take place during the charcoal-making process, or pyrolysis, which are difficult to control. However, once we gain an understanding of pyrolysis, we can abstract our problem and more practically follow a general procedure in order to maximize charcoal output from a given material. (For example, We have X amount of hydrogen and carbon in our material, so we should use X temperature and heating rate.)

Since experimentation to figure out these patterns takes a long time, we instead used machine learning to analyze data from related, past experiments and doubly find those patterns for us.

Diagram
Figure 1: A diagram of the different types of information that are being given to the machine learning model: chemical information, the heating process, and the resulting charcoal output. A machine learning model will use decision trees and statistics to draw out important patterns from given data. We are interested in seeing what the model has to say about the effect of certain features, like temperature, on the amount and quality of biochar produced.

Lower temperature and heating rate led to higher biochar yield

My work began by using the machine learning model, Random Forest. I used 152 data points, collected from available research articles and spanning many different materials, from sludge to coconut shells. I then ran feature importance tests to understand what input data was most influential on the amount and quality of biochar produced. Here is a feature importance chart I generated for a Random Forest model trained on 80% of the collected data:

Graph
Figure 2: This is a feature importance chart generated from a Random Forest model that was trained on 80% of our 152 data points, and tested on the remaining 20%. Features affecting biochar yield are listed on the left axis, and percent importance, listed in tenths, is on the bottom axis. Here, the chart shows that around 80% of biochar yield is determined by pyrolysis temperature, and nearly 10% is determined by heating rate.


The feature importances on this chart are listed in descending order. Here, for biochar yield alone, the most influential process parameter, or feature, was temperature. In combination with other statistical tools, like Pearson Correlation Coefficient charts, we saw that higher temperatures often lead to lower biochar yield. Ultimately, we found that a combination of lower temperature and heating rate led to higher biochar yield.

Graph
Figure 3: This is a two-way partial dependence plot, showing the interaction between temperature and heating rate and their effects on biochar yield. High temperature and heating rate have an overall negative effect on biochar yield. However, after decreasing the heating rate, partial dependence slowly rises, then spikes at around 10 C/min. The same trend follows for temperature, with a spike at around 400C.

Exploring alternate machine learning models

Over the summer, I continued analyzing the different factors of biochar production and explored different machine learning models with the hopes of finding one that could best predict biochar yield and quality from the same 152 data points. These other models were K Nearest Neighbors (KNN), Gradient Boosting, Lasso Regression, and Artificial Neural Networks (ANN-MLP).

And indeed, after learning about the different algorithms and implementing them, some of the models surpass Random Forest in terms of accuracy. One of them was Gradient Boosting, which works by penalizing errors as it “learns”. The other was ANN-MLP, a very sophisticated model that learns from its errors by sending information backwards in its decision-making and updating the weight, or importance, of a feature proportional to how much it’s responsible for error.


Interestingly, the best-performing model, Gradient Boosting, gave a slightly different feature importance chart than the one given by Random Forest:

Graph
Figure 4: This is a feature importance chart generated from a Gradient Boosting model that was trained on 80% of our 152 data points, and tested on the remaining 20%. Features affecting biochar yield are listed on the left axis, and percent importance, listed in tenths, is on the bottom axis. Here, the chart shows that over 50% of biochar yield is determined by pyrolysis temperature, and around 10% is determined by ash content. In yellow are the error bars for each feature’s importance. The red and green dots represent the Pearson Correlation Coefficients for each feature, and indicate whether higher values of those features generally result in greater biochar yield (green), or less (red).

Ash comes out on top

Instead of “Heating Rate” being listed as the second most important feature, “Ash” was. This was a curious result! This leads me to my final remark, which is that more work and more representative data need to be collected before settling on the final result. We can also ask more questions, like which subsets of input data we can isolate to predict yield, or whether imputing missing variables affects model performance.

Information local biochar producers can use

Creating a quality model will help us determine a useful range to employ for certain features like temperature. This information will be useful for local biochar producers, including D-Lab partners around the world, and will aid in process improvement. This fall, I will help my supervisors produce a paper with our findings and explore some of these additional questions. This research will be shared with community partners associated with D-Lab, and I am very excited to see how this project will affect future biochar production processes so that more people can access clean-burning cooking fuels.


About the author: Johlesa Orm is a sophomore studying AI and Decision-Making at MIT. She enjoys taking part in projects that challenge her and allow her to contribute to improving communities and the environment.


More information

MIT D-Lab Biomass Fuels and Cookstoves research group

Contact

Dan Sweeney, MIT D-Lab Research Scientist, Biomass Fuels and Cookstoves lead