Experimental Type Research Questions

27 Oct 2022

Knowledge & Research

PDF Equation Formatted Experimental Type of Research Questions

By: Ir. Togar A. Napitupulu, MS., MSc., Ph.D

Consider the following two cases:

Implementing a newly improved marketing technique might be claimed to reach and persuade customers to buy on average more than 100 customers per month. Sample of sales person were trained with the new technique and observed their monthly selling.
The hypothesis for such experimentation is
H0 : μ = 100
H1: μ > 100
where μ is the population average of the number of customers persuaded by the sales.
From the sample of say, n sales person, we calculate the average sales and the sample variance. Using these two information t-statistics and p-value for one sided test (see H1, having one direction “less than”)can be calculated. If this is less than α/2, then we enough evidence from the sample to reject H0.
We might want to compare two different method of teaching or training our employee and we would like to know the most effective method in terms of their productivity. The two methods can be thought of as two different treatment in lab such that one group of employee is being treated with one of the method and the other group being treated with the other method. The hypothesis to be tested might be:

H0: μ1 = μ2 versus
H1: μ1 > μ2

Because we are suspecting that method one is more effective than method two. μ1 and μ2 are the population mean of the first and the second method respectively. This can be tested using t-statistics (t test), testing the mean of two populations. Again we can use p-value as decision rule whether to reject H0 or not enough evidence from sample to reject H0.

Both cases are called one-factor problem. The factor is the treatment with two levels. In the first case, the treatment is the newly improved marketing method and the samples were taken. The other level is without sample, i.e., simply a constant where the value can be thought of as had been derived from previously taken samples. In the second case, the levels are method 1 and method 2 and two sets of samples were taken from both. The following is a case where there are more than two levels.

One-Way ANOVA: Sometimes we might want to test three or more treatments, or to test the mean of three or more population. In such case we can not use t-test anymore. We have to resort to a method called Analysis of Variance (ANOVA). The hypothesis for such problem is:
H0: μ1 = μ2 = μ3 = …. = μk versus
H1: At least two of them are not equal to one another
where k is the number of treatments or populations. The test statistic used is F-stat, or F –test. Again p-value can be used to decide whether to reject H0 or not enough evidence to reject H0. When H0 is rejected, then we still need to test which particular pairs is the one that is not equal to one another. To test which of the pairs is not equal, we use “Tukey-Kramer Procedure”. When there are n populations to test or n means, then there are ([n(n-1)])⁄2 pairs to test. For example, we might want to test more than two methods of marketing and want to know which one of them that has the greater impact. Another example is we might want to know which of three of more methods of impact of training on our employee and which one is the best among them.
Notes on strategy of experimental design: Experimental units (such as students, mice, company, products, etc., provide the heterogeneity that leads to experimental error . To avoid this, we randomly assigned the factor levels to the experimental units.
ASSUMTION: The k populations are independent and normally distributed with mean μ_1,μ_2,…,μ_k and common variance σ^2. This assumption is supported by the randomization.
THE MODEL:
Let y_ij denote the jth observation from the ith treatment. Each observation may be written in the form:
Y_ij=μ_i+ϵ_ij …………………………………… ( )
Where ϵ_ij measures the deviation of the jth observation of the ith sample from the corresponding treatment mean – the same as the error term in regression. Another way of writing this equation is by substituting μ_i=μ+α_i , subject to the constraint ∑_(i=1)^k▒α_i =0. Hence we may write
Y_ij=μ+α_i+ϵ_ij.
And α_i is called the effect of the ith treatment; μ is the grand mean of all the μ_i’s. The above hypothesis then now can be replaced by :
H_0:α_1=α_2= …=α_k=0.
H_1:At least one of the α_i^’ s is not equal to zero.
Testing the hypothesis is based on comparison of two independent estimates of the common population variance σ^2 based on the following Sum-of-squares Identity:
∑_(i=1)^k▒〖∑_(j=1)^n▒〖(y_ij-y ̅_(..))〗^2 =〗 n∑_(i=1)^k▒〖(y ̅_(i.)-y ̅_(..))〗^2 +∑_(i=1)^k▒∑_(j=1)^n▒〖(y_ij-y ̅_(i.))〗^2
That is
SST = SSA + SSE
Where
SST = total sum of squares.
SSA = treatment sum of squares.
SSE = error sum of squares (within treatment sum of squares).
It can be shown that expected value of SSA
E(SSA)=(k-1) σ^2+n∑_(i=1)^k▒α_i^2 .
If H0 is true, and thus each αi in the above formula is equal to zero, then
E(s_1^2 )=E(SSA/(k-1))=σ^2,
Or that s_1^2 is an unbiased estimate of σ^2. However, if H1 is true, we have
E(s_1^2 )=E(SSA/(k-1))=σ^2+n/(k-1) ∑▒α_i^2 ,
that is, s_1^2 estimates σ^2 plus an additional term, which measures variation due to the systematic effects. We also know that another independent estimator of σ^2 based on k(n-1) degrees of freedom is the familiar s^2,
s^2=SSE/(k(n-1)).
Now, when H0 is true then, the ratio F=(s_1^2)⁄s^2 is a random variable having the F-distribution with k-1 and k(n-1) degrees of freedom. Since s_1^2 overestimates σ^2 when H0 is false, we have a one-tailed test with the critical region entirely in the right tail of the F distribution; That is, we have F-ratio for testing equality of the means (see hypothesis above).

Complete article : PDF Equation Formatted Experimental Type of Research Questions