Data Analysis for Health Care Based on Their Productivity
Question :
An investor needs to make a decision on whether to acquire one of two medical clinics based on their productivity, as measured by the total number of visits per month. You have been asked whether there is a significant difference in the total number of visits per month between clinic 1 and clinic 2.
Answer :
Defining the Hypothesis:
From the theoretical point of view the Null and Alternative Hypothesis are:
H0: Both the clinics i.e. Clinic_1 and Clinic_2 have the same productivity
Vs
H1: Clinic_1 is not better than Clinic_2. (i.e. the 2 clinics don’t have the same productivity)
Checking Assumptions:
In the data set we are given, there are 2 groups, namely clinic_1 and clinic_2. As shown below. We want to test whether the 2 groups have same productivity or the difference between the 2 groups vary significantly.
Clinic_1
|
Clinic_2
|
140
|
169
|
126
|
151
|
30
|
175
|
130
|
115
|
193
|
167
|
In these type of scenarios, Two-sample t-test or Independent sample t-test is appropriate as the data abides by the below assumptions of Independent sample t-test (Xu et al, 2017):
There is one continuous dependent;
The two samples are independent;
The two samples follow normal distributions, and can be done with Normality check
So, our Hypothesis in this case will be:
H_0: μ_1 = μ_2 ("the 2 population means are equal")
Vs
H_1: μ_1< μ_2 ("the 1st population mean is less than 2nd population mean")
It is a one tailed t-test. (left tailed)
Where,
μ_1:the population mean for Clinic_1
μ_2:the population mean for Clinic_2
Note that, the second set of hypotheses can be derived from the first set by simply subtracting μ_2 from both sides of the equation.
Descriptive statistics
|
Measure
|
Clinic_1
|
Clinic_2
|
Mean
|
124.32
|
145.03
|
Standard Error
|
4.678187
|
3.978083
|
Median
|
134.5
|
149.5
|
Mode
|
150
|
175
|
Standard Deviation
|
46.78187
|
39.78083
|
Sample Variance
|
2188.543
|
1582.514
|
Kurtosis
|
-0.43575
|
0.264653
|
Skewness
|
-0.50511
|
-0.06286
|
Range
|
183
|
221
|
Minimum
|
24
|
42
|
Maximum
|
207
|
263
|
Sum
|
12432
|
14503
|
Count
|
100
|
100
|
From the above table we cannot proceed with the assumption of Equality of Variances for the 2 groups. So, the test-statistic will be:
t= (x ̅_1 〖- x ̅〗_2)/√(〖s_1〗^2/n_1 + 〖s_2〗^2/n_2 )
where
x ̅_1 = Mean of first sample (i.e. Mean of data on Clinic_1)
x ̅_2 = Mean of second sample (i.e. Mean of data on Clinic_2)
n1 = Sample size (i.e., number of observations) of first sample (i.e. sample size of data on Clinic_1)
n2 = Sample size (i.e., number of observations) of second sample (i.e. sample size of data on Clinic_2)
s1 = Standard deviation of first sample (i.e. standard deviation of data on Clinic_1)
s2 = Standard deviation of second sample (i.e. standard deviation of data on Clinic_2)
The calculated t value is then compared to the critical t value from the t distribution table with degrees of freedom (df)
df= (〖s_1〗^2/n_1 + 〖s_2〗^2/n_2 )^2/(1/(n_1-1) (〖s_1〗^2/n_1 )^2+1/(n_2-1) (〖s_2〗^2/n_2 )^2 )
and chosen confidence level. If the calculated t value < critical t value, then we reject the null hypothesis.
Note that this form of the independent samples t test statistic does not assume equal variances (Gerald, 2018). This is why both the denominator of the test statistic and the degrees of freedom of the critical value of t are different than the equal variances form of the test statistic.
P-value: The p-value is the probability of obtaining results as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct. The p-value is used as an alternative to rejection points to provide the smallest level of significance at which the null hypothesis would be rejected. A smaller p-value means that there is stronger evidence in favour of the alternative hypothesis.
In simpler terms, p-value is the probability of Rejecting the Null hypothesis at the Level of Significance.
Level of Significance (α): While testing a hypothesis 2 types of error occur. To get better testing rule Minimal values of both these errors are desirable. As Type I error (EI) is more severe than Type II error (EII) so an upper bound of Type I error is stipulated i.e., we only consider those tests with Type I error less than or equal to that stipulated level. This Upper bound is called Level of Significance (L OS).
Mathematically, a test is said to be Level α-test (0≤ α≤1) if,
P(E_I)≤ α [Note, a level α-test is also called α*-test where α*> α]
In this case we are proceeding with the standard assumption of 5% level of Significance.
Drawing conclusions:
If p-value ≤ α then we will reject the null hypothesis at α % level of significance.
Or
If the calculated t value < t critical value, then we reject the null hypothesis.
Interpretation of Results:
In our case we have got the p-value as 0.00045 < 0.05 so we will reject the Null hypothesis at 5% level of Significance.
Also, we got t-value as -3.37247 < 1.652787. So, we reject the null hypothesis.
Decision for administration:
So, from the administrative point of view we can conclude that the difference between the total number of visits per month for the 2 clinics is Statistically significant at 5% level and it is better to go with investing on clinic_2 over Clinic_1 as the hypothesis of difference between the population mean of clinic_1 and clinic_2 is less than 0 is less likely to get rejected.
References
Xu, M., Fralick, D., Zheng, J. Z., Wang, B., Tu, X. M., & Feng, C. (2017). The differences and similarities between two-sample t-test and paired t-test. Shanghai archives of psychiatry, 29(3), 184.