ML-As-1

Vector Calculus Review (15 pts)

When , is a symmetric matrix.

Bayes’ Rule (10 pts)

Assume the probability of a certain disease is . The probability of test positive given that a person is infected with the disease is and the probability of test positive given the person is not infected with the disease is .

(a) Calculate the probability of test positive.

  • : The probability that a person has the disease.
  • : The probability of testing positive given that the person has the disease.
  • : The probability of testing positive given that the person does not have the disease.
  • : The probability that a person does not have the disease.

(b) Use Bayes’ Rule to calculate the probability of being infected with the disease given that the test is positive.

Gradient Descent Mechanics (20 pts)

Gradient descent is the primary algorithm to search optimal parameters for our models. Typically, we want to solve optimization problems stated as

where are differentiable functions. In this example, we look at a simple supervised learning problem where given a dataset , we want to find the optimal parameters that minimizes some loss. We consider different models for learning the mapping from input to output, and examine the behavior of gradient descent for each model.

a

The simplest parametric model entails learning a single-parameter constant function, = 𝜃. where we set . We wish to find

i. What is the gradient of with respect to ?

ii. What is the optimal value of ?

when

iii. Write the gradient descent update.

Where is the learning rate.

iv. Stochastic Gradient Descent (SGD) is an alternative optimization algorithm, where instead of using all N samples, we use single sample per optimization step to update the model. What is the contribution of each data-point to the full gradient update?

Thus, the gradient descent update for a single data point is:

In SGD, this single sample gradient update is used to update \theta after each data point.

b

Instead of constant functions, we now consider a single-parameter linear model where we search for such that

i. What is the gradient of with respect to ?

ii. What is the optimal value of ?

when

iii. Write the gradient descent update.

Where is the learning rate.

iv. Do all points get the same vote in the update? Why or why not?

No, not all points get the same vote in the gradient update.

Each data point is weighted by .
If is large, that data point will have a larger influence on the gradient (and thus on the update),
Whereas if is small, the influence will be smaller.

MAP Interpretation of Ridge Regression (20 pts)

Consider the Ridge Regression estimator

We know this is solved by

One interpretation of Ridge Regression is to find the Maximum A Posteriori (MAP) estimate on , the parameters, assuming that the prior of is and that the random is generated using

Note that each entry of vector is zero-mean, unit-variance normal. Show that is indeed the MAP estimate for given an observation on .


The MAP estimate maximizes the posterior distribution:

From Bayes' rule:

Substituting the likelihood and prior:

Maximizing this expression with respect to is equivalent to minimizing the following expression:

Programming (35 pts)

Task 1

# You should return your result. 

import numpy as np 

def insertSecond(a, b):
    return np.insert(a, 1, b)

assert np.array_equal(insertSecond(np.array([-5,-10,-12,-6]),5), np.array([-5, 5, -10, -12, -6]))
assert np.array_equal(insertSecond(np.array([1,2,3]),7), np.array([1, 7, 2, 3]))
assert np.array_equal(insertSecond(np.array([-5,-10,-12,-6]),8), np.array([ -5, 8, -10,-12, -6]))
assert np.array_equal(insertSecond(np.array([1,2,3]),12), np.array([1, 12, 2, 3]))

Task 2

import numpy as np 

def mergeArrays(a,b):
    return np.sort(np.unique(np.concatenate((a,b))))

# Test cases 
assert np.array_equal(mergeArrays(np.array([1,1,4,8,1]), np.array([2, 3])), np.array([1, 2, 3, 4, 8])) 
assert np.array_equal(mergeArrays(np.array([-5,-10,-10,-6]), np.array([-5, 8, -10, -12,-6])),np.array([-12, -10, -6, -5, 8]) )
assert np.array_equal(mergeArrays(np.array([1,1,6,8,1]), np.array([2, 3])), np.array([1, 2, 3, 6, 8]))

Task 3

import numpy as np
import matplotlib.pyplot as plt

# data to plot
n_groups = 5
men_means = (22, 30, 33, 30, 26)
women_means = (25, 32, 30, 35, 29)
alpha = 0.5

fig, ax = plt.subplots()
index = np.arange(n_groups)
bar_width = 0.4
opacity = 0.8

rects1 = plt.bar(index, men_means, bar_width,
alpha=0.5,
color='g',
label='Men')

rects2 = plt.bar(index + bar_width, women_means, bar_width,
alpha=0.5,
color='r',
label='Women')

plt.xlabel('Person')
plt.ylabel('Scores')
plt.title('Scores by person')
plt.xticks(index + bar_width / 2, ('G1', 'G2', 'G3', 'G4', 'G5'))
plt.legend()

plt.tight_layout()
plt.show()

output_5_0

Task 4

import pandas as pd


def setDataFrameZeros(df):
    rows = df.isin([0]).any(axis=1)
    cols = df.isin([0]).any(axis=0)
    df.loc[rows, :] = 0
    df.loc[:, cols] = 0
    return df

df1 = pd.DataFrame({'c1': [1, 4, 7], 'c2': [2, 0, 8], 'c3': [3, 6, 9]})
df2 = pd.DataFrame({'c1': [1, 0, 7], 'c2': [0, 0, 0], 'c3': [3, 0, 9]})
(df2.equals(setDataFrameZeros(df1)))

df1 = pd.DataFrame({'c1': [0, 3, 1], 'c2': [1, 4, 3], 'c3': [2, 5, 1], 'c4': [0, 2, 5]})
df2 = pd.DataFrame({'c1': [0, 0, 0], 'c2': [0, 4, 3], 'c3': [0, 5, 1], 'c4': [0, 0, 0]})
assert (df2.equals(setDataFrameZeros(df1)))

df1 = pd.DataFrame({'c1': [1, 4, 7], 'c2': [2, 0, 8], 'c3': [3, 6, 9]})
df2 = pd.DataFrame({'c1': [1, 0, 7], 'c2': [0, 0, 0], 'c3': [3, 0, 9]})
assert (df2.equals(setDataFrameZeros(df1)))

df1 = pd.DataFrame({'c1': [0, 3, 1], 'c2': [1, 4, 3], 'c3': [2, 5, 1], 'c4': [0, 2, 5]})
df2 = pd.DataFrame({
    'c1': [0, 0, 0],
    'c2': [0, 4, 3],
    'c3': [0, 5, 1],
    'c4': [0, 0, 0]
})
assert (df2.equals(setDataFrameZeros(df1)))