5 Validation

To validate our claims that both vectors create samples that match the truncated triangular distribution, a few hypothesis tests are in order.

First, we validate that the sample variance and the distribution variance match; second, we evaluate whether their mean statistics match. If both these tests are passed, we perform a goodness of fit test. Specifically, a Kolmogorov-Smirnof test or a Chi-Square test.

These tests rely on a significance level (\(\alpha\)) which we have appropriately picked to be 0.1. The grater the significance level, the more certain you are that your results are not a product of chance and the true behavior of your random variable.

Usually when using the more common procedures for testing hypothesis, our data needs to meet a few criteria.

5.1 Hypothesis test for Variance

Specifically, when testing for variance equivalence using a Chi-Square (\(\chi^{2}\)) test, we should validate that the sample variance follows a \(\chi^{2}\) distribution.

This usually isn’t the case but to the detriment that we introduce some uncertainty, the condition can be assumed if the distribution of the data roughly ressembles a normal distribution and the sample size is big enough (our sample size is 10000, no need to worry there).

If we didn’t know that our data follows a triangular distribution this could be a fairly save assumption, but because that really isn’t the case, more advanced techniques like bootstraping should be used while performing this test. Such is left as an excercise to the reader.

Let’s study the mathematical procedure.

Since we want to validate that the variance of the sample is the same as that of the distribution (some value \(\sigma^{2}\)), we make opposing statements about it. We call these statements our null hypothesis and our alternate hypothesis. In the end, we should be able to say, for some certainty, whether or not we may reject the null hypothesis.

\[ H_{0}:\hat{\sigma} ^{2} = \sigma^{2} \]

\[ H_{a}:\hat{\sigma}^{2} \neq \sigma^{2} \]

We then calculate the sample variance:

\[ S^{2}=\sum_{i=1}^{n}\frac{(x_{i}-\bar{X})^{2}}{n-1} \]

and plug it into the test statistic:

\[ \left(\frac{S}{\sigma}\right)^{2}(n-1)\sim\chi^{2}_{n-1}. \]

The alternate hypothesis tells us that this is a two-tailed test, meaning we must use both ends of the \(\chi^{2}\) distribution. We reject \(H_{0}\) if the test statistic falls outside the interval bounded by the \(\chi^{2}\) inverse CDF evaluated at \(1-\frac{\alpha}{2}\) and at \(\frac{\alpha}{2}\) with \(n-1\) degrees of freedom. This may seem unorthodox, but the most common notation used to reference these bounds is \(\chi^{2}_{1-\frac{\alpha}{2},n-1}\) and \(\chi^{2}_{\frac{\alpha}{2},n-1}\).

\[ \chi^{2}_{\frac{\alpha}{2},n-1}\leq\left(\frac{S}{\sigma}\right)^{2}(n-1)\leq \chi^{2}_{1-\frac{\alpha}{2},n-1} \Rightarrow \hat{\sigma}^{2}=\sigma^{2} \]

We shall now implement this in R. Do consider, there is no exact method to calculate the inverse CDF of the \(\chi^{2}\) distribution that we could reasonably implement such as a closed form expression, and so we must use R’s native function to find approximate values. Most mathematical software come with their own implementation of this method or have third party libraries to extend their functionality (such as SciPy for Python).

Do remember that we have stored the simulated data for each method in their respective variables: simulated.Z.prime.Inv.CDF and simulated.Z.prime.t.and.e, and the truncated triangular distribution object in the variable my.trun.tri.dist.

## Inverse CDF method

alpha <- 0.1
degrees.of.freedom <- (length(simulated.Z.prime.Inv.CDF) - 1)

## 1. Sample Variance and true variance
s.squared.Inv.CDF <- sum((mean(simulated.Z.prime.Inv.CDF) - simulated.Z.prime.Inv.CDF) ^ 2) / degrees.of.freedom

sigma.squared <- my.trun.tri.dist$trun.tri.var

## 2. Test Statistic
test.statistic <- s.squared.Inv.CDF * degrees.of.freedom / sigma.squared

## 3. Evaluate if statistic falls within bounds
lower.bound.criteria <- test.statistic >= qchisq(p = alpha / 2, df = degrees.of.freedom)
upper.bound.criteria <- test.statistic <= qchisq(p = 1 - alpha / 2, df = degrees.of.freedom)

## 4. Conclude:
print(paste(
  "With a significance level of 0.1, we ",
  if (lower.bound.criteria & upper.bound.criteria) {
    "may not reject the null hypothesis"
  } else {
    "may reject the null hypothesis"
  },
  sep = ""))
## [1] "With a significance level of 0.1, we may not reject the null hypothesis"

Effectively, via the inverse CDF method we generate a sample with the same variance as the underlying truncated triangular distribution.

## Trial and Error

degrees.of.freedom <- (length(simulated.Z.prime.t.and.e) - 1)

## 1. Sample Variance
s.squared.t.and.e <- sum((mean(simulated.Z.prime.t.and.e) - simulated.Z.prime.t.and.e) ^ 2) / degrees.of.freedom

## 2. Test Statistic
test.statistic <- s.squared.t.and.e * degrees.of.freedom / sigma.squared

## 3. Evaluate if statistic falls within bounds
lower.bound.criteria <- test.statistic >= qchisq(p = alpha / 2, df = degrees.of.freedom)
upper.bound.criteria <- test.statistic <= qchisq(p = 1 - alpha / 2, df = degrees.of.freedom)

## 4. Conclude:
print(paste(
  "With a significance level of 0.1, we ",
  if (lower.bound.criteria & upper.bound.criteria) {
    "may not reject the null hypothesis"
  } else {
    "may reject the null hypothesis"
  },
  sep = ""))
## [1] "With a significance level of 0.1, we may not reject the null hypothesis"

Effectively, via the trial and error method we generate a sample with the same variance as the underlying truncated triangular distribution.

We may move on to test for the mean.

5.2 Hypothesis test for mean

Specifically, when testing for mean equivalence using a t-test, we should validate that the sample mean follows a normal distribution. This does not mean that the data should be normally distributed.

This assumption, because of the central limit theorem, should hold up for fairly large datasets like ours. Roughly speaking, this can be assumed of any dataset with over 50 complete observations.

Let’s study the mathematical procedure.

We now want to validate that the mean of the sample is the same as that of the distribution (some value \(\mu\)), we make opposing statements about it.

\[ H_{0}:\hat{\mu} = \mu \]

\[ H_{a}:\hat{\mu} \neq \mu \]

We then calculate the sample mean:

\[ \bar{X}=\frac{\sum_{i=1}^{n}x_{i}}{n} \]

and plug it into the test statistic:

\[ \frac{\bar{X}-\mu}{\frac{S}{\sqrt{n}}} \sim t_{n-1}. \]

The alternate hypothesis tells us that this is a two-tailed test, meaning we must use both ends of Student’s t distribution. We reject \(H_{0}\) if the test statistic falls outside the interval bounded by the t inverse CDF evaluated at \(1-\frac{\alpha}{2}\) and at \(\frac{\alpha}{2}\) with \(n-1\) degrees of freedom. This may seem unorthodox, but the most common notation used to reference these bounds is \(t_{1-\frac{\alpha}{2},n-1}\) and \(t_{\frac{\alpha}{2},n-1}\).

\[ t_{\frac{\alpha}{2},n-1}\leq\frac{\bar{X}-\mu}{\frac{S}{\sqrt{n}}}\leq t_{1-\frac{\alpha}{2},n-1} \Rightarrow \hat{\mu}=\mu \]

We shall now implement this in R. Same as with the test for variance, there is no exact method to calculate the inverse CDF of Student’s t distribution that we could reasonably implement, and so we must use R’s native function to find approximate values.

## Inverse CDF method

## 1. Sample mean and true mean
x.bar.Inv.CDF <- mean(simulated.Z.prime.Inv.CDF)

mu <- my.trun.tri.dist$trun.tri.mean

## 2. Test Statistic
test.statistic <- (x.bar.Inv.CDF - mu) / (sqrt(sigma.squared / length(simulated.Z.prime.Inv.CDF)))

## 3. Evaluate if statistic falls within bounds
lower.bound.criteria <- test.statistic >= qt(p = alpha / 2, df = degrees.of.freedom)
upper.bound.criteria <- test.statistic <= qt(p = 1 - alpha / 2, df = degrees.of.freedom)

## 4. Conclude:
print(paste(
  "With a significance level of 0.1, we ",
  if (lower.bound.criteria & upper.bound.criteria) {
    "may not reject the null hypothesis"
  } else {
    "may reject the null hypothesis"
  },
  sep = ""))
## [1] "With a significance level of 0.1, we may not reject the null hypothesis"

Effectively, via the inverse CDF method we generate a sample with the same mean as the underlying truncated triangular distribution.

## Trial and Error

## 1. Sample mean and true mean
x.bar.t.and.e <- mean(simulated.Z.prime.t.and.e)

mu <- my.trun.tri.dist$trun.tri.mean

## 2. Test Statistic
test.statistic <- (x.bar.t.and.e - mu) / (sqrt(sigma.squared / length(simulated.Z.prime.t.and.e)))

## 3. Evaluate if statistic falls within bounds
lower.bound.criteria <- test.statistic >= qt(p = alpha / 2, df = degrees.of.freedom)
upper.bound.criteria <- test.statistic <= qt(p = 1 - alpha / 2, df = degrees.of.freedom)

## 4. Conclude:
print(paste(
  "With a significance level of 0.1, we ",
  if (lower.bound.criteria & upper.bound.criteria) {
    "may not reject the null hypothesis"
  } else {
    "may reject the null hypothesis"
  },
  sep = ""))
## [1] "With a significance level of 0.1, we may not reject the null hypothesis"

Effectively, via the trial and error method we generate a sample with the same mean as the underlying truncated triangular distribution.

We may move on to the goodness of fit tests.

5.3 Goodness of Fit Tests

Goodness of fit tests, for our purposes, serve as a means of validating paired statements about the distribution of our data. Specifically, our null hypothesis would state that the sampled random variable follows a PDF with its respective estimated parameters while our alternate hypothesis tells us exactly the opposite.

In terms of hypothesis:

\[ H_{0}: \hat{f}(x) = f(x) \] \[ H_{a}: \hat{f}(x) \neq f(x) \Leftrightarrow \neg H_{0}. \]

Multiple tests have been devised for goodness of fit, each with their own strengths. It is up to us to determine the most sound method.

5.3.1 Chi-Square Test

The \(\chi^2\) test is particularly useful when fitting data with a discrete distributions of moderate size, though the procedure can be extended to continuous distributions by binning the domain into intervals or classes of equal probability. For samples as large as ours this approach is perhaps not well suited since it is highly likely that the test will fail for most distributions since we require many bins to split the distribution and the probability of each becomes very narrow. Nonetheless, the procedure is enriching.

  1. Estimate the parameters of the distributions you suspect the data shuld follow. We will use the parameters of the distribution we want the sample to resemble instead (Those of the truncated triangular distriburtion).

Let \(X \sim f(x)=Triangular(L,U,M)\) and \(Z \sim h(x) = f(z| a < X \leq b)\)

where \(L = 2\), \(U = 9\), \(M = 7\), \(a = 3\), and \(b = 8\).

  1. Estimate the amount of classes or bins to categorize the continuous distribution. Often, for large datasets, it is recommended to pick \(\frac{\sqrt{n}}{5} \leq k \leq \sqrt{n}\).

\[ 20 \leq k \leq 100. \]

We will take \(k\) to be 100.

  1. Calculate the probability that every class should have (\(p=1/k\)) and find the bounds for each class with said probability.

Let H(x) be the CDF to the random variable Z.

We define the lower and upper bounds of each class \(i\in \{ 1,\cdots, k\}\) to be \(l_{i}\) and \(u_{i}\). A value \(z\) is said to belong to class \(i\) if \(l_{i} \leq z < u_{i}\)

\[ l_{i} = H^{-1}(p(i-1)), \ u_{i} = H^{-1}(p*i) \]

  1. Once all classes have had their bounds calculated, we may count how many members of the random sample belong to each class (\(o_{i}\), observed frequency), and how many members we expected to beling according to its assigned probability (\(e_{i} = p*n\), expected frequency).

  2. We ma now calculate the test statistic

\[ \sum_{i=1}^{k}\frac{(o_{i}-e_{i})^{2}}{e_{i}} \sim \chi^{2}_{k-r-1} \]

Notice how the statistic follows a \(\chi^{2}\) distribution with \(k-r-1\) degrees of freedom. Where \(k\) is the number of classes and \(r\) is the amount of parameters of the distribution (5 for the truncated triangular).

  1. Once again, consider you level of significance and evaluate the critical value after which the null hypothesis should be rejected. This is a right tail test.

\[ \sum_{i=1}^{k}\frac{(o_{i}-e_{i})^{2}}{e_{i}}\leq \chi^{2}_{1-\alpha,k-r-1} \Rightarrow \hat{h}(z)=h(z) \]

Let’s see how we could go about doing this in R. First, for the inverse CDF method.

## Inverse CDF sample

## Find the intervals for each class i in {1, ..., k}
set.1.to.k <- 1:100
p <- 1/100
l_i <- sapply(p * (set.1.to.k - 1), my.trun.tri.dist$inverse.cdf)
u_i <- sapply(p * set.1.to.k, my.trun.tri.dist$inverse.cdf)

## Find the observed frequency of every class
e_i <-
  sapply(set.1.to.k, function(i) {
    sum(simulated.Z.prime.Inv.CDF >= l_i[i]
      & simulated.Z.prime.Inv.CDF <= u_i[i]
    )
  })

## Find the expected frequency
o_i <- p * 10000

## Calculate the test statistic
test.statistic <- sum(((o_i - e_i) ^ 2) / e_i)

## Calculate the Chi Square ritical value
critical.value <- qchisq(1-alpha, df = 100 - 5 - 1)

## Conclude
print(paste(
  "With a significance level of 0.1, we ",
  if (test.statistic <= critical.value) {
    "may not reject the null hypothesis"
  } else {
    "may reject the null hypothesis"
  },
  sep = ""))
## [1] "With a significance level of 0.1, we may not reject the null hypothesis"

Now for trial and error.

## Trial and Error

## Find the observed frequency of every class
e_i <-
  sapply(set.1.to.k, function(i) {
    sum(simulated.Z.prime.t.and.e >= l_i[i]
      & simulated.Z.prime.t.and.e <= u_i[i]
    )
  })

## Calculate the test statistic
test.statistic <- sum(((o_i - e_i) ^ 2) / e_i)

## Conclude
print(paste(
  "With a significance level of 0.1, we ",
  if (test.statistic <= critical.value) {
    "may not reject the null hypothesis"
  } else {
    "may reject the null hypothesis"
  },
  sep = ""))
## [1] "With a significance level of 0.1, we may not reject the null hypothesis"

Effectively, it would seem that both our procedures for generating a sample from the truncated triangular distribution were succesful. Let us move on to a more well suited test for continuous distributions.

5.3.2 Kolmogorov-Smirnov Test

The Komogorov-Smirnov test is more robust for goodness of fit procedures than the \(\chi^{2}\) because it allows for samples of any size and it does not require that you split continuous functions into bins or classes, which is undesirable as it introduces a source of error.

The procedure goes as follows:

  1. Order the random sample from smalles to largest, without removing duplcates (yields \(y_{j}, \forall j\in \{1,\cdots, n \}\)).

  2. Calculate the test statistic as follows:

\[ D^{+} = \max_{y_{j}, \forall j\in \{1,\cdots, n\}}{\{ j/n - H(y_{j})\}},\ D^{-} = \max_{y_{j}, \forall j\in \{1,\cdots, n\}}{\{ H(y_{j}) - (j-1)/n\}} \]

\[ D = \max{\{ D^{+}, D^{-}\}} \]

  1. If \(D\) is greater than the K-S critical value for \(\alpha = 0.1\) and \(n=10,000\), reject the null hypothesis.

The procedure seems fairly straight forward, except for the part where we calculate the K-S critical value. This time, at least in R, you might require some outside resources such as a K-S table or a library that can simulate the statistic with which to compare. We will not implement this procedure in R since we already have evidence that the samples follow the truncated triangular distribution described in previous chapters.

5.4 Further Analysis

Keep in mind that these tests all have their respective counterparts wherin a comparison is made not between a dataset and a distribution but between datasets, meaning we could also ask ourselves whether or not both methods for simulating a probability density function produce similar samples in terms of variance, mean, and goodness of fit.

Notice also that we could have used a much more restrictive level of significance. In fact, the smallest p-value among all tests could yield us an \(\alpha\) for which all tests would still pass and make our reasearch much more significant, but such is considered a bad practice. Remember the term p-hacking for future reference and frown upon anyone who abuses statistics in this fashion.