ELSTE Data Science Master Course

An introduction to data science using Python

Authors
Affiliations

Sébastien Biass

Department of Earth Sciences, University of Geneva

Stéphane Guerrier

Department of Earth Sciences, University of Geneva

Section of Pharmaceutical Sciences, University of Geneva

Practical information

Hello and welcome to the website for the Fall part of the ELSTE data science class at the University of Geneva. The objective of this class is to introduce you to some tools - both technical (i.e., use of Python and dedicated packages) and theoretical (i.e., understanding of the statistical methods) - that will ideally increase your ability to explore and interpret the data you will generate or collect during your Master theses.

The course will rely on Python, which was selected because it is both free and general purpose. We assume that you are not completely stranger to Python, yet you haven’t gained yet the confidence to start using it for your data processing and analysis workflow. The primary objective of this class is to provide you with this confidence. It would be foolish to believe that this course will be enough to make you experts at it, but we hope it will provide you the necessary tools get you started.

You might also be already familiar with other free (e.g., R, Julia, Octave) or commercial (Matlab) scientific programming language. In this case, the course contains sufficient theoretical material to keep your interest high, and, depending how comfortable you are with your language of predilection, you might also want to use it rather than Python at your own risk.

In a nutshell, by the end of the course, you will ideally:

  1. Be familiar with the basic packages and functions in Python for exploring, analysing and visualising your data;
  2. Understand the theory for some of the most frequently used statistical analyses, thus opening the gate to statistical inference.

Course schedule

The course is composed of three 6-h sessions on Wednesday mornings from 9h - 12h from Oct 15 to Nov 19 2025. The tentative schedule is as follows:

Date Instructor Topic
Oct 15, 2025 SB Introduction to Python, data manipulation
Oct 22, 2025 SB Exploratory data analysis, plotting
Oct 29, 2025 SG T-test, ANOVA, multiple comparisons
Nov 5, 2025 SG Linear regression
Nov 12, 2025 SB/SG TBD
Nov 19, 2025 SB/SG TBD

Class location

The class takes place in the computer lab in the Science III building (and not in the Maraîchers building!!) located on Boulevard D’Yvoy. Check out the map below to find the room.

Use of the computer lab

  • Please make sure you have your ISIS login and password to login
  • Follow this tutorial to setup VSCode on PCs of the computer lab

Alternatively, if you want to use Python on your own computer:

  • Make sure you installed an environment manager (e.g. Miniconda) and created a dedicated environment with the following packages installed: pandas, numpy, jupyterlab
  • Make sure you installed VS Code and:
    • the Python extensions following this tutorial
    • make sure VS Code can find your Python environment
Using your own computer

We won’t have time to help you install Python on your computer during the class.

Assistants

The course assistants TAs are:

The course also benefited from contributions from:

A note on the use of LLMs

We all know that you will soon or later resort to using LLMs for your coding journey and, if you haven’t done already, might very well use some for this class. It would be hypocritical of us to pretend we never do (take the cheat sheets at the end of classes 1 & 2 for instance!). The thing is, there is a big difference between using LLMs to save time - which implies that the code produced by a LLM can be read, understood and critically reviewed by the end user - and using LLMs to seek an answer - which implies using a magical black box.

Python is one of the most versatile language used today, and scientific coding is only one of its purpose along such tasks as web development, cybersecurity, software and game development or finance. All of these topics have a plethora of dedicated packages and libraries, and there is a high probability that one specific task can be achieved by more than one library. The problem is that libraries for specific topics are designed to integrate together, whereas a blind use of libraries for other topics might lead to inconsistencies in codes.

Therefore, we encourage you to play the game with this class and, for now, restrict the use of LLMs before you are confident that you can read through the codes they offer you. If you do, make sure you take the opportunity to investigate the answers they provide you:

  • what functions does it use?
  • what are the arguments of the functions?

We are obviously more than happy to debate this in class should you have a different take!

Re-use this class

This course is openly available in the hope it can benefit others. The course material is published under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International licence.

Unless specified, any code snippet and software is published under a GNU GPLv3 license.