Statistical data analysis


course ID

Lecturer

CFU

6

Length

14 Weeks

Semester DD

Second


Course details

Module 1: Statistical data analysis (Dr. Umberto De Sanctis)

Length: 20h lectures

 

  • Basic concepts in probability (frequentist and bayesian approaches, properties of distributions, characteristic functions)
  • Probability Distribution Functions and their properties (law of large numbers, central limit theorem, convergence, discrete and continuous distributions)
  • Theory of the information (likelihood, Fisher information)
  • Decision and estimators theory (Maximum Likelihood principle, Least-Squares method, bayesian inference)
  • Point Estimation (parameter estimation with uncertainties, bias estimation)
  • Interval estimation (confidence and credible intervals, upper/lower limits, asymptotic approximation) 
  • Test of hypotheses (Neyman-Pearson test, likelihood ratio test)
  • Goodness of fit assessment (Pearson chi-square test, tests for unbinned distributions)

 

Bibliography:

  • F. James, “Statistical methods in experimental physics” 2nd edition, World Scientific, 2006
  • G. Cowan, “Statistical data analysis” Oxford Science Publications, 1998

 

Module 2: basics of c++ and ROOT  (Dr. Vincenzo Vitale)

Length. 12h (8h lessons  + 4h exercises) 

 

This module is meant to provide:

- a basic knowledge of the C++ programming language and of the ROOT data analysis toolkit;

- the capability to perform operations such as data I/O and computational elaboration, which are at the base of the scientific data analysis; 

- the capability to integrate ROOT classes within custom analysis codes.

 

The topics covered within the module are:

 

1) The basics of C++. Functions, types, scope, pointer, arrays, tests;

2) User-Defined Types. Structures, classes, other types;

3) Modularity. Separate compiling, namespaces, error handling;

4) Classes.Concrete types, abstract types, virtual functions, class hierarchies, copy and move;

5) The ROOT software toolkit. The basics, the most frequently used classes;

6) An example of ROOT code for data analysis. Histograms, random filling, fitting, storage on disk;

 

 

Bibliography:

"A Tour of C++", Bjarne Stroustrup, Addison-Wesley

"C++ guida essenziale per il programmatore",  Bjarne Stroustrup, Pearson Italia (it is the translated version of the book above)

ROOT documentation at https://root.cern.ch/

 

 

Module 3 : template fits and systematic uncertainties treatment (Dr. Marco Vanadia)

Length: 8h (2h lessons + 6h exercise)

 

The students will perform an exercise on a measurement of a physics parameter via a Binned Maximum Likelihood template fit, with emphasis on the impact of systematic uncertainties on the measurement and on their correlation modelling.

 

Lecture: recap on the basics of Binned Maximum Likelihood template fits and on Asimov datasets, introduction to the software for the analysis, treatment of systematic uncertainties and typical use cases for analysis.

 

Exercise: the exercise has the goal to make students learning the following techniques:

 

  1. Implementation of histogram production, stat-only ML fits to Asimov data, first implementation of systematic uncertainties in the fit.
  2. Detailed test of systematic uncertainties model for the analysis and on the assumptions for the uncertainties correlation modelling.
  3. Fit to data, extraction of the physics parameter, evaluation of constraints to systematic uncertainties in data.

 

Module 4 : Unfolding techniques in high energy physics (Dr. Valerio Formato)

Length: 8h (2h lecture + 6h exercise)

 

In this module we will study the effects introduced by instrumental resolution on the measurement of a desired distribution. The students will apply several statistical techniques to recover the underlying distribution and compare the results obtained with these techniques and/or with different regularisations.

 

Lecture: intro to unfolding and what problems it tries to solve. Overview of different unfolding methods: bin-to-bin correction, SVD-regularised unfolding, bayesian unfolding, iterative bin-to-bin and forward folding.

 

Exercise:  students will implement one or more of these techniques (depending on available time) and apply them on different datasets/resolution functions.

 

 

Objectives

The goals of this course are to give to students the basic notions on the statistical treatment of data in physics and to allow them, through specific hands-on sessions, to use these notions to solve common practical problems in particle physics.

The course is made of 4 modules. The first module will give an overview of the basic concepts in statistical methods for experimental physics; the second will introduce the C++ programming language and the ROOT framework while the other two modules will be devoted to a more practical part with specific exercises  in some of the most relevant problems in high-energy physics data analysis.