End-to-End Data Science Project Workflow
When you’re working on a data science project - data analysis, ML models, LLM experiments, or quick apps - it’s helpful to have a clear system for developing, publishing, and sharing your work.
This is the workflow I use to go from local notebooks to hosted apps and public writeups. It works well for both solo projects and professional prototypes, and uses open-source and free tools throughout.
1. Develop (local or remote)
- Anaconda to manage Python environments
- JupyterLab for notebooks and analysis
- VS Code for scripts, app and package development; integrates with GitHub for version control
- Quarto for writing reports, websites, and blog-style content (supports Jupyter notebooks for code examples)
2. Publish (code, content, apps)
Code & Reports
- GitHub: store code, README files, and notebooks in public and private repositories
- GitHub Pages: host personal portfolio, Quarto-based websites, and writeups
- nbviewer: fallback viewer for Jupyter notebooks that won’t render on GitHub
Apps & Demos
- Streamlit: build interactive apps with Python and deploy to streamlit.io
- Gradio: quickly wrap ML functions into web UIs for demos or prototypes
- Hugging Face Spaces: host public demos (Gradio or Streamlit) with no infrastructure setup
- Note: free hosting can be slow to wake up after inactivity
3. Share
- Post projects or insights on LinkedIn (pin key projects to your profile)
- Write articles about the work on Medium, Towards Data Science, or Substack
- Link everything back through your GitHub Pages portfolio site