Student project streamlines climate action database

Student project streamlines climate action database

Luke Hellum presenting a poster describing his semester of research, “Data Cleaning, Deduplication, and Entry Matching in a Carbon Commitment Database,” at the Yale Statistics & Data Science poster session.

On Thursday, May 2, 2019, Luke Hellum, a Yale College data science and statistics and energy studies major, shared the results of his senior capstone research project. Over the past 6 months, he built a statistical R software package that helps clean and organize the vital, messy universe of climate action commitments from cities, regions, companies, investors, and civil society.

Over the past few years, the Data-Driven Lab has built a database of climate action commitments put forward by a wide range of corporate and local government authorities, alongside contextual information, such as these actors’ population or GDP. This database draws from 17 different sources – all providing important data, in a wide array of different forms and formats. Organizing this data unlocks important information about the scope and impact of this rising tide of “bottom-up” climate action – but often demands time-consuming cleaning and validation.

As the database grew and became more complex, matching new information to existing database entries became an increasingly pressing issue. Luke’s semester-long research project sought to improve the database management and entry matching through a preliminary analysis on Python and a collection of R scripts that improve data cleaning and matching. His analysis honed in on the most persistent problems when matching entries (i.e. encoding, language, typos, and uncleaned names) and provided guidance on how to streamline data entry and management. The new R scripts streamline efforts to update and identify data entries, and will allow future datasets to be managed in a more systematic way.

“Cleaning and wrangling data is the first — and often the most difficult — hurdle in conducting data-driven analysis,” said Dr. Angel Hsu, Luke’s capstone advisor and the Director of Data-Driven Lab. “We plan to road-test this tool over the summer, and look forward to sharing the finished version later this year.”

As he shared his research with students, professors, and visitors at the Statistics and Data Science Poster Session, Luke noted the additional challenges and rewards of working with a real-world dataset: “It was a pleasure having the opportunity to work with Professor Hsu and her team on the project. Going into it, I expected to learn some data science techniques, but I also came out of it with an appreciation of how challenging it can be to clean data and effectively incorporate it into a large and complex database.”






Disasters Expo USA, is proud to be supported by Inergency for their next upcoming edition on March 6th & 7th 2024!

The leading event mitigating the world’s most costly disasters is returning to the Miami Beach

Convention Center and we want you to join us at the industry’s central platform for emergency management professionals.
Disasters Expo USA is proud to provide a central platform for the industry to connect and
engage with the industry’s leading professionals to better prepare, protect, prevent, respond
and recover from the disasters of today.
Hosting a dedicated platform for the convergence of disaster risk reduction, the keynote line up for Disasters Expo USA 2024 will provide an insight into successful case studies and
programs to accurately prepare for disasters. Featuring sessions from the likes of The Federal Emergency Management Agency,
NASA, The National Aeronautics and Space Administration, NOAA, The National Oceanic and Atmospheric Administration, TSA and several more this event is certainly providing you with the knowledge
required to prepare, respond and recover to disasters.
With over 50 hours worth of unmissable content, exciting new features such as their Disaster
Resilience Roundtable, Emergency Response Live, an Immersive Hurricane Simulation and
much more over just two days, you are guaranteed to gain an all-encompassing insight into
the industry to tackle the challenges of disasters.
By uniting global disaster risk management experts, well experienced emergency
responders and the leading innovators from the world, the event is the hub of the solutions
that provide attendees with tools that they can use to protect the communities and mitigate
the damage from disasters.
Tickets for the event are $119, but we have been given the promo code: HUGI100 that will
enable you to attend the event for FREE!

So don’t miss out and register today:

And in case you missed it, here is our ultimate road trip playlist is the perfect mix of podcasts, and hidden gems that will keep you energized for the entire journey


This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More