## Performing Hypothesis Test For The Difference In Population Proportions

**FEEDBACK (This is what you’re answering based on the discussion post and python script below ðŸ˜Š ) :**

Can you define null and alternative hypotheses in mathematical terms? Please begin by defining P1 and P2. Define P1 as the proportion of bearings < 2.2 cm in the first population and P2 the proportion in the second? Now state the null and alternative hypotheses in terms of P1 and P2. Note we are testing if the two proportions are significantly different or not.

**My Personal Discussion Post :**

The null hypothesis is the proportion of ball bearings with diameter values less than 2.20 cm, and which in the existing manufacturing process is the same as the proportion in the new process. The alternative hypothesis is that the proportion of ball bearings with diameter values less than 2.20 cm in the existing manufacturing process, which is not equal to the proportion in the new process.

The level of significance is 5%, which follows standard distribution under both hypotheses.

The test statistic is -0.66 while the two tailored p-value is 0.5085. The p-value is more than 0,05, implying that we fail to reject the null hypothesis at a 5% level of significance.

In conclusion, we accept the null hypothesis is that the proportion of ball bearings with diameter values less than 2.20 cm in the existing manufacturing process is the same as the proportion in the new process. It is because there is no enough evidence available to refute the claims.

**My Personal Python Script :**

1) Generating Sample Data

import pandas as pd import numpy as np # create 50 randomly chosen values from a normal distribution. (arbitrarily using mean=2.48 and standard deviation=0.500) diameters_sample1 = np.random.normal(2.48,0.500,50) # convert the array into a dataframe with the column name "diameters" using pandas library diameters_sample1_df = pd.DataFrame(diameters_sample1, columns=['diameters']) diameters_sample1_df = diameters_sample1_df.round(2) # create 50 randomly chosen values from a normal distribution. (arbitrarily using mean=2.50 and standard deviation=0.750) diameters_sample2 = np.random.normal(2.50,0.750,50) # convert the array into a dataframe with the column name "diameters" using pandas library diameters_sample2_df = pd.DataFrame(diameters_sample2, columns=['diameters']) diameters_sample2_df = diameters_sample2_df.round(2) # print the dataframe to see the first 5 observations (note that the index of dataframe starts at 0) print("Diameters data frame of the first sample (showing only the first five observations)") print(diameters_sample1_df.head()) print() print("Diameters data frame of the second sample (showing only the first five observations)") print(diameters_sample2_df.head())

Diameters data frame of the first sample (showing only the first five observations) diameters 0 2.30 1 2.86 2 2.00 3 1.95 4 2.28 Diameters data frame of the second sample (showing only the first five observations) diameters 0 2.27 1 2.86 2 1.96 3 0.97 4 2.38

2) Performing Hypothesis Test For The Difference In Population Proportions

from statsmodels.stats.proportion import proportions_ztest # number of observations in the first sample with diameter values less than 2.20. count1 = len(diameters_sample1_df[diameters_sample1_df['diameters']<2.20]) # number of observations in the second sample with diameter values less than 2.20. count2 = len(diameters_sample2_df[diameters_sample2_df['diameters']<2.20]) # counts Python list counts = [count1, count2] # number of observations in the first sample n1 = len(diameters_sample1_df) # number of observations in the second sample n2 = len(diameters_sample2_df) # n Python list n = [n1, n2] # perform the hypothesis test. output is a Python tuple that contains test_statistic and the two-sided P_value. test_statistic, p_value = proportions_ztest(counts, n) print("test-statistic =", round(test_statistic,2)) print("two tailed p-value =", round(p_value,4))

test-statistic = -0.66 two tailed p-value = 0.5085