In their book Generalized Linear Models (New York: Chapman & Hall, 1983), the authors P.
McCullagh and J.A. Nelder used the Poisson regression to study the ship dataset. Poisson
regression is a special case of the generalised linear models in which the target variable, or
dependent variable, is Poisson distributed. Since one of the main application areas of Poisson
regression is to fit linear models on count data, we can therefore use Poisson regression to
predict the number of incidents (which are also counts) given some input variables.
Mathematically, Poisson regression is a linear model in which the expected value of the target
variable Y is calculated by
where β0 is the intercept, β1, β2, … , βk are the coefficients of the independent variables X1, X2,
…, Xk. E(Y) is the predicted, or expected value of Y, which will be transformed by the natural
(a) Find the corresponding scikit-learn module in the official website of scikit-learn and
discuss the corresponding module, estimator, fit and predict functions, as well as their
parameters in your own words.
(b) Analyse the data by fitting a Poisson regression based on the DataFrames X and Y
generated in Question 1. Follow the instruction in the official website and report the
parameters of the estimated model. Create a Python program to fit a Poisson regression
and generate a table or a DataFrame to present the coefficients with the corresponding
log𝔼𝔼(𝑌𝑌) = 𝛽𝛽0 + 𝛽𝛽1𝑋𝑋1 + 𝛽𝛽2𝑋𝑋2 + ⋯ + 𝛽𝛽𝑘𝑘𝑋𝑋𝑘𝑘,
ANL252 Copyright © 2021 Singapore University of Social Sciences (SUSS) Page 6 of 6
ECA – July Semester 2021
(c) The deviance of Y and its expected value E(Y), estimated by the model constructed in
c), measures the goodness of fit of the model. The lower the deviance, the better is the
model. Below is the equation of how it should be calculated.
If Y = 0, the expression log[Y/exp(E(Y))] will be taken as zero. Employ your own
Python program to compute D without using the score() function of the scikit-learn
—– END OF ECA PAPER —