ML-As-2

Point Estimation

The Poisson distribution is a useful discrete distribution which can be used to model the number of occurrences of something per unit time. For example, in networking, packet arrival density is often modeled with the Poisson distribution. If is Poisson distributed, i.e., , its probability mass function takes the following form:

It can be shown that . Assume now we have i.i.d. data points from . (For the purpose of this problem, you can only use the knowledge about the Poisson and Gamma distributions provided in this problem.)

(a)

Show that the sample mean is the maximum likelihood estimate (MLE) of and it is unbiased ().

Finding the MLE

Unbiasedness

Since    are i.i.d., we can take the expectation inside the sum:

Therefore,  , confirming that   is an unbiased estimator of  . of 

(b)

Now let's be Bayesian and put a prior distribution over . Assuming that follows a Gamma distribution with the parameters , its probability density function:

Where (here we assume is a positive integer). Compute the posterior distribution .

Let ,
Then the distribution is still a Gamma distribution

(c)

Derive an analytic expression for the maximum a posterior (MAP) of under prior.

Prior Distribution

Likelihood function


Source of Error: Part 1

(a)

The bias of an estimator is defined as

The bias is

The variance of an estimator is defined as

This is not a good estimator, since the bias is large when the true value of is not 1. Usually we don’t have any information about the true value of , so it is unreasonable to assume it is equal to 1.

(b)

the bias is 0.
This is an unbiased estimator.
The variance of this estimator is

This is not a good estimator since its variability does not decrease with the sample size.

(c)

Bias of the estimator :

Variance of the estimator :

Source of Error: Part 2

(a)

(b)

The error is equal to 0.

Because and do not overlap.

Just check whether it is in the interval [-4,-1] or in the interval [1,4]

(c)

(d)

  • and (using the variance formula for the uniform distribution),
  • and .

Since we are approximating using a normal distribution, we have:

  • ,
  • .

Using these, for , we find , and for , . Therefore, the classifier will make no error in classifying new points.

(e)

Given a finite amount of data, we will not learn the mean and variance of perfectly. Therefore, the classifier's error will increase due to the limited data. In this scenario, we would have both bias and error in our model.

Gaussian (Naïve) Bayes and Logistic Regression

No, the new is no longer the form used by logistic regression.

The log ratio of class-conditional probabilities:

Simplifies to:

Probability of :

Simplifies to: