Decoding the Data Ecosystem Institute (DeCoDE Institute) 

Decoding the Data Ecosystem Institute (DeCoDE Institute) 

Welcome to the Decoding the Data Ecosystem (DeCoDE) Institute, a comprehensive virtual institute and capstone project event on open science, reproducible research, knowledge exchange, and skill development through Common Fund Data Ecosystem (CFDE) web portals, tools, and resources. Grounded in FAIR (Findable, Accessible, Interoperable, Reusable) principles, the CFDE is a unified network for discovering, accessing, and analyzing diverse biomedical datasets to accelerate scientific discovery. 

The institute is hosted by the CFDE Training Center managed by ORAU in collaboration with BioData Sage LLC. 

Registration isrequiredto attend training offerings. Participants may register at any time. Early registration is encouraged as spots in each trainingare limited.    

Overview 

The DeCoDE Institute combines structured hands-on training with collaborative capstone projects to build practical skills in data science, with a focus on FAIR principles and the open-source tools and public data available from the Common Fund Data Ecosystem.   

Virtual live and asynchronous trainings are offered for 8-weeks between June 1 to July 24, 2026. This is a self-directed learning institute, allowing you to choose the number of trainings you align with your interests. Live trainings will occur every other week with asynchronous training opportunities in between. Registrants will receive access to all training materials as they become available in our GitHub repository. 

The institute is designed for cumulative skill development culminating in a Capstone Project event. The Capstone Project will be half-day sessions on July 28, 29, and 30 with cash prizes for winning teams and a certificate for all our participants. The more DeCoDE Institute trainings you attend, the more skills you’ll have to bring into the Capstone Project! Participants must attend all three days of the Capstone Project event to be eligible for team prizes. 

Additional Information

graphic explaining the timeline for the summer institute. May 12: DeCoDE  Informational Webinar. Learn all the details about the DeCoDE Institute and get your questions answered.May 12 1:00 PM. June 1-July 24: Training Topics: Intro to Omics,  Galaxy/CFDE Cloud Workspace, Coding in R, Version control, Data Visualizations. Additional trainings available. July 28-30: 3 Day capstone event. Put your new skills to the test and gain hands-on experience. Cash prizes for winning teams and certificate for all participants.
Getting Started

Prerequisites 

  • No prior programming experience required
  • Computer with internet connection 

Workshop Materials 

Each GitHub weekly folder will include:  

  • Presentation slides 
  • Interactive exercises 
  • Cloud Workspace links 
  • Additional resources 

Technical Setup 

We will use the CFDE Cloud Workspace for many of our live trainings.

  • No local installation required: everything runs in your browser
  • Pre-configured environments with Galaxy, R, and specialized packages 
  • Persistent workspace: your work is automatically saved 

Learning Objectives 

By completing the institute trainings, participants will be able to:  

  • Develop skills to analyze omics datasets and build workflows using CFDE tools and resources. 
  • Engage in collaborative, hands-on learning experiences and foster innovation through live training sessions. 
  • Utilize CFDE resources and e-learning modules for flexible, asynchronous learning opportunities. 
  • Demonstrate their knowledge and skills by participating in team-based capstone projects. 

Contact & Support 

Full Schedule

Registration is required to attend training. Participants may register at any time. Early registration is encouraged as spots in each training session are limited. 

All times listed are in Eastern Time Zone

DATE AND TIME TITLE AND OVERVIEW
Tuesday, May 12
1:00 p.m. to 2:00 p.m.
DeCoDE Institute Informational Webinar  
Tuesday, June 2
1:00 p.m. to 3:00 p.m.
Part 1: Overview of CFDE and Data Scavenger Hunt
This session provides a foundational overview of the Common Fund Data Ecosystem (CFDE), highlighting its purpose, structure, and role in advancing biomedical research. Participants will engage in a hands-on Data Scavenger Hunt designed to familiarize them with CFDE resources, data and tools, fostering an interactive and practical understanding of the ecosystem.  
Thursday, June 4
1:00 p.m. to 5:00 p.m.
Part 2: Introduction to Omics and CFDE Resources
This session introduces participants to the fields of omics (e.g., genomics, transcriptomics, proteomics, metabolomics, epigenomics), covering its significance and applications in biomedical research. Following the introduction, participants will learn about and explore CFDE web portals and tools, gaining insights into its functionalities and learning how to navigate and utilize the web portals for data-driven research. 
Week of June 8 Asynchronous Learning – Explore CFDE tools and resources (CFDE 101 as primary learning) 
Thursday, June 11
1:00 p.m. to 2:00 p.m.
Speed Networking: CFDE Connections
This session brings together researchers from the CFDE community and beyond for a fast-paced virtual networking experience. Participants will have the opportunity to meet new collaborators and share their research across data programs and scientific domains. Whether you're a student just beginning your research journey or already diving into topics, this is your chance to expand your network and spark meaningful connections within the CFDE community!
Monday, June 15
1:00 p.m. to 4:00 p.m.
Part 1: Applied AI for Biologists Using Galaxy and CFDE Cloud Resources
This training introduces participants to Galaxy and the CFDE Cloud Workspace, an innovative platform for data analysis and workflow management. Participants will learn about the workspace's features, tools, and capabilities, setting the stage for advanced data integration and analysis in subsequent sessions.
Wednesday, June 17
1:00 p.m. to 4:00 p.m.
Part 2: Applied AI for Biologists Using Galaxy and CFDE Cloud Resources 
Building on the foundational knowledge of Galaxy and the Cloud Workspace, this session dives into practical applications by integrating CFDE datasets with AI tools. Participants will get hands-on practice in leveraging AI-driven methodologies to answer meaningful biological questions and enhance their data processing skills. 
Week of June 22 Asynchronous Learning – Deepen your understanding of workflow development and optimization 
Tuesday, June 30
1:00 p.m. to 5:00 p.m.
Part 1: An Introduction to R and Data Wrangling with the Tidyverse
This session introduces the fundamentals of programming in R using an interactive R Notebook environment. Participants will learn core concepts such as objects, data types, packages, and basic data exploration with basic hands-on exercises. It also introduces the tidyverse ecosystem in R with a focus on data manipulation using the dplyr package. Participants will get hands-on real-world scenarios with essential functions for selecting, filtering, transforming, summarizing, and grouping data.
Wednesday, July 1
1:00 p.m. to 5:00 p.m.
Part 2: Joining, Pivoting, and Plotting in R
This session builds on foundational data wrangling skills by introducing more advanced techniques for combining, reshaping, and visualizing data in R. Participants will learn how to merge datasets with dplyr and data reshaping with tidyr, including pivoting data between long and wide formats to prepare it for analysis. Finally, participants will explore multiple data visualization using ggplot2, creating informative and customizable plots. Hands-on exercises will reinforce these concepts through practical, real-world examples.
Week of July 6 Asynchronous Learning – Further enhance your programming knowledge
Week of July 13 Let’s take a Summer Break! No sessions or asynchronous learning this week.  
 If you’re in the Washington D.C. area, join us at the Intelligent Systems for Molecular Biology (ISMB) Conference. Learn more at https://www.iscb.org/ismb2026/home  
Monday, July 20
1:00 p.m. to 3:00 p.m.
Reproducible Data Science with Git and GitHub 
This workshop introduces the fundamentals of version control using Git and GitHub and explains their role in reproducible and collaborative data science. Participants will learn how to track changes, document their work, and manage project history using Git within RStudio. 
Wednesday, July 22
1:00 p.m. to 5:00 p.m.
Data Visualization Essentials 
This workshop introduces key considerations for creating effective data visualizations, including how to choose appropriate plots and communicate insights clearly. Participants will learn core design principles, accessibility practices, and strategies to avoid misleading or distorted visuals. Through examples attendees will develop skills to evaluate and create clear, informative, and publication-ready figures.
CAPSTONE EVENT BEGINS *Must attend all three days to be eligible for Team Prize
Tuesday, July 28
Noon to 5:00 p.m.
Day 1 Kick-off and Team Projects begin 
Wednesday, July 29
Noon to 5:00 p.m.
Day 2 Team Projects continue 
Thursday, July 30
Noon to 5:00 p.m.
Day 3 Team Projects Wrap-Up and Final Presentations