Categories
Business Statistics Lecture Note

Normal Distribution Visualized (Interactive)

Please click here to open the full presentation:

You can:

  • See how the shape of the distribution changes with a difference choice of \mu and \sigma .
  • You can overlay another normal distribution with a set of different parameters.
Categories
Business Statistics

“The Area Under the Curve Concept” using Uniform Distribution

Please click the link to see an interactive version.

In this example

f(x)=\frac{1}{20} where 120 \leq x \leq 140

You can change the value of b to see how does it change the value of the integral, which is the area under the curve.

Categories
Business Statistics Lecture Note

Business Statistics Lecture Slide

This is the lecture note I have developed for fall 2020 business statistics. I will develop the next half as my course progresses.

The slide is developed using LaTeX. So, I don’t really have a PPT version. If you want to use it as PPT, you can export all the pages as images, then import those images into PPT.

Some highlights of the lecture note:

  • Less number of chapters than most textbooks in the market (I did not omit any important topics, I just combined chapters so that it appears there is a smaller number of chapters to study.)
  • I did eliminate some topics that I personally do not use in the descriptive statistics part. (for example, stem-and-leaf, stacked bar chart, etc.)
  • Has a detailed topic index where you can conveniently jump to the point (From the students’ perspective, it is convenient for the review purpose.)
  • Introducing and emphasizing the idea of the distribution from earlier chapters. (Students typically struggle with the concept of distribution because it is so theoretical. I tried to demonstrate the formation of and the practical utility of using a distribution from early sections.)
  • There is a brief introduction on how to read and utilize mathematic notations, especially the summation notation. (Students struggle with this is a lot. If they do not understand notations, providing a formula sheet does not mean anything.)

Credit: I did use some images and questions from Andrew et al, Statistics for Business and Economics, 12th ed. For other sources, I tried to add the credit whenever possible. I tried to develop the visualization myself whenever possible.

Copyright: If you want to use it for your class, please just drop a line here in reply or message me over my LinkedIn. I just want to make more friends. If you want my LaTeX code, I can share with you if you agree to share your work with me afterwards, so that I can study from you.

Download File:

BUAD2060_BusinessStatistics_Fall2020

Categories
Business Statistics

Data, Histogram, Distribution, and Probability

This set of video explains the following ideas:

  • Video 1: How do you get a distribution from a bunch of numbers?
  • Video 2: What to look for from a distribution?
  • Video 3: How to utilize a distribution to get probability?

Categories
Business Statistics Excel Modeling Quality Management

From Population Distribution to Sampling Distribution (Simulation)

Sampling distribution is an important concept in statistics. We rely on sampling distributions (e.g., sample mean and sample proportion) to make decisions about whether to accept or to reject the hypotheses about the population properties.

Students usually have tough time understanding the concept due to its highly theoretical nature. The attached Excel spreadsheet can help visualize the process of obtaining a sampling distribution of population mean. Specifically, it allows you to specify a sample size n and the number of sample groups of k. Thus, you will have k number of sample averages can be used to construct a distribution. This Excel applet allows the process of obtaining the sampling distribution more visible.

Note:

  • Original file has a population of 10,000 observations that follow a normal distribution with mu = 500 and sigma = 50. If you wish to demonstrate the law of large numbers, you can replace the population data with your own.
  • When you specify a very big k, for example 400, Excel will freeze for a moment to process the request. Please monitor your CPU usage.
  • My website does not allow me to upload Macro enabled Excel file as/is. That is reason why you are seeing a zip file.
  • When use in Windows 10, please enable Macro, Data Analysis ToolPak, and Data Analysis ToolPak – VBA. Otherwise, it will report a run time error.
  • For Mac, please see this link. Essentially you need to enable the developer ribbon.

(Note: Written in Excel VBA)

I have another similar worked example in R here.

Categories
Business Statistics Quality Management R

Sampling Distribution and Testing Hypothesis

I developed this lecture note spring 2020 using R markdown for the first time. It supports the compilation of R, Markdown, and LaTex code at the same time! I was really impressed.

http://www.compsaver.net/StatsNotes/02-13-2020%20Chapter%206.%20Statistical%20Techniques.html#/

Markdown Code:

Chapter 6. Statistical Techniques in Quality Management
========================================================
author: Z. Wen (OSCM 3340, Spring 2020)
date: 02/12/2020, Thursday
autosize: true
font-import: https://fonts.googleapis.com/css?family=Fira+Sans
font-family: 'Fira Sans', sans-serif
width: 1440
height: 900

Learning Objeectives
========================================================

- Review of Sampling Distribution
- Confidence Interval
- Testing Hypothesis
- Various Distributions
- Sample Size Determination


Estimation
========================================================
Conceptually, the following relationship holds in any type of estimation. 
$$ \theta = \hat{\theta} + M.E.$$

In quality management, we are often interested in the process mean. (e.g., Does our machines need alignments?)
$$ \mu = \bar{x} + M.E. $$

We are also interested in the process standard deviation. (e.g., Does our machines need calibrations?)
$$ \sigma = s + M.E. $$


Margin of Error
========================================================

**Since a lot of times we have information about $\bar{x}$ and $s$, we need to develop our knowledge on $M.E.$**

There are three components in developing the confidence interval 
- Your level of confidence  ($1-\alpha$)
- Sample size  ($n$)
- Best estimate of the population s.d.  ($\sigma$ or $s$, whichever available)

Here is the formal relationship of these three in forming the margin of error: 
$$M.E. = z_{\alpha/2}\frac{\sigma}{\sqrt{n}}$$

Confidence Interval 
========================================================

**With the knowledge of M.E., we can construct an interval where the true parameter is located with $1-\alpha$ level of confidence.**

$$C.I. = \bar{x} \pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}}$$

There are many different types of variations. For example, 
$$C.I. = \bar{x} \pm t_{\alpha/2}\frac{s}{\sqrt{n}}$$
$$C.I. = \bar{p} \pm z_{\alpha/2}\sqrt{\frac{\bar{p}*(1-\bar{p})}{n}}$$

And many more... $C.I.$ for F distribution, $\chi^2$ distribution, etc. **As long as it is an estimation result, you will always see the reporting of $C.I.$**


Confidence Interval (Example)
========================================================

See the following calculated examples: 

| Level of $\alpha$ | $n$ |    $z$   | $\sigma$ |   M.E.   | $\bar{x}$ | C.I. | Interval Length |
|:-----------------:|:---:|:--------:|:--------:|:--------:|:---------:|:-------------------:|:---------------:|
|        0.01       | 100 | 2.58 |     4    | 1.03 |     34    |    [32.97, 35.03]   |     2.06    |
|        0.05       | 100 | 1.96 |     4    | 0.78 |     34    |   [33.22, 34.78]  |     1.57    |
|        0.1        | 100 | 1.64 |     4    | 0.66 |     34    |   [33.34, 34.66]  |     1.32    |
|        0.01       |  64 | 2.58 |     4    | 1.29 |     34    |   [32.71, 35.29]  |     2.58    |
|        0.05       |  64 | 1.96 |     4    | 0.98 |     34    |    [33.02, 34.98]   |     1.96    |
|        0.1        |  64 | 1.64 |     4    | 0.82 |     34    |   [33.18, 34.82]  |     1.64    |

<small>Although the name could be confusing, Excel formula **=CONFIDENCE.NORM(alpha, standard_dev, size)** and **=CONFIDENCE.T(alpha, standard_dev, size)** will give you the **M.E.** value. </small>

One Very Important Application (1/4) - Confirming Doubts
========================================================
**How to find out someone who had your total trust betrayed you?** 
<br>

*I am almost certain that he/she won't do that...* 

But what if he/she got caught in doing that thing. Is he/she still trustworthy?

**You heartfully believed the mean is 7. But, what if your 95% confidence interval does not contain 7? Will you still believe the mean is 7?**

In this case, you either have to update your belief on the mean, or you must have encountered a rare chance event.


One Very Important Application (2/4)
========================================================

**Example:** <br>
A cylinder manufacturer claims that their process mean is 12.5 mm. Historically, their process standard deviation was .08 mm and there is no reason to think that the s.d. has changed. Upon drawing a random sample of 9, the sample average was 12.22. Please test the claim under 5% of error tolerance level. 

*Questions*
1. First, please use a visual aid to determine the answer. <br>
2. Please use a formal approach to determine the probability of obtaining such sample average, given the true mean is 12.5 mm. <br>
3. What is the conclusion? 



One Very Important Application (3/4)
========================================================
If what the company claiming is true, this will be the distribution of the population. 
- Population mean $\mu = 12.5$
- Population standard deviation $\sigma = 0.08$

<center>
![plot of chunk unnamed-chunk-1](Confidence Interval PowerPoint-figure/unnamed-chunk-1-1.png)
</center>

***
<small>
If what the company claiming is true, for $n=25$, we have... 
- Hypothesized mean of $\mu_0 = 12.5$
- Standard Error of $\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{.08}{5} = 0.016$
- 95% C.I. $\bar{x} \pm ME = [12.19, 12.25]$
</small>

<center>
![plot of chunk unnamed-chunk-2](Confidence Interval PowerPoint-figure/unnamed-chunk-2-1.png)
</center>



Formal Hypothesis Testing (3/4)
========================================================

**Hypothesis testing** re-defined: 
- It is a process of determining a likelihood of surprise for a given sample result.
- How likely it is that we can obtain a sample mean of this size, given the claim is true?

Formal hypothesis: <br>
- $H_o: \mu = 12.5$ and $H_\alpha: \mu \neq  12.5$ 

Determination of the Likelihood (Excel Formula): 
- $p-value = norm.dist(12.22, 12.5, 0.016, TRUE) = 7.16E-69 \approx 0$

**Verdict: It is very unlikely that the true mean is 12.5**

Any idea about the ture mean: 
- All we know is, it is very unlikely 12.5. With a 95% confidence, it could be said it is within [12.19, 12.25]


Another Example - Two Group Mean Testing (Exercise)
========================================================

**Case**: Please determine whether the following two hospitals have the same quality rating.
- Data URL: [Download Data](https://blackboard.utdl.edu/bbcswebdav/pid-7699937-dt-content-rid-63616690_1/xid-63616690_1) <small>(*UTAD ID/PW Needed for Access*) </small>







<center>
<img src="Confidence Interval PowerPoint-figure/unnamed-chunk-5-1.png" title="plot of chunk unnamed-chunk-5" alt="plot of chunk unnamed-chunk-5" width="1000" height="600" />
</center>



Formalizing Two Group Mean Test
========================================================
A visual inspection of the confidence interval seems to be arguing that the means are not the same. Now, we formalize the test. 
- The mean difference of $d = \bar{x}_A - \bar{x}_B$ are known to follow a **T distribution** when the both $\sigma_A$ and $\sigma_B$ are not known.
- T distribution gives a more liberal estimate than Z distribution as long as the degree of freem (*formula omitted*) is less than 120.
- A **formal hypothesis**: $H_0: \mu_A - \mu_B = 0$ and $H_\alpha: \mu_A - \mu_B \neq 0$
- Construct a hypothesized distribution with the mean of $0$ and the standard error of $\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$
- Then draw samples from each group and calculate the mean difference $d$ to determine how likely (through $p-value$) it is that one can obtain such results when there are assumed to have no difference.

[Tool from ArtofStat (Two Mean Test)](https://istats.shinyapps.io/2sample_mean/) 


Excel Solution for Two Group T-Test
========================================================

**Excel Data Analysis ToolPak**

![Analysis ToolPak in Excel](https://www.excel-easy.com/examples/images/t-test/select-t-test-two-sample-assuming-unequal-variances.png)

Also as Excel Formulae

$=T.DIST(X, DF, TRUE)$
$=T.DIST.2T(X, DF)$
$=T.DIST.RT(X, DF)$



***

**Side of the Test**

![Which side are you testing?](https://ars.els-cdn.com/content/image/3-s2.0-B9780128008522000092-f09-06-9780128008522.jpg)

How would you define a surprise? 
- Too big is a surprise (Right-Tail)
- Too small is a surprise (Left-Tail)
- Too big or too small both (Two Tail)




Overview of Distributions and Their Usages
========================================================

Different test statistics assume different statistical distributions. Here are what is relevant to our current context. 

<center>

|                              | Testing for One Group | Comparing Two Groups |
|------------------------------|-----------------------|----------------------|
| Mean                         | Z distribution        | Z distribution       |
| Mean (When $\sigma$ unknown) | T distribution        | T distribution       |
| Variance                     | $\chi^2$ distribution | F distribution       |

</center>

In the quality context, we wish to know
- Whether the machine needs alignment? (Parts are even, but missing the target)
- Whether the machine needs re-calibration? (Generating uneven parts, but meets the target)


Overview of Distributions and Their Usages (cont.)
========================================================

Other types of distributions and their potential usage: 

<center>

| Use Case                              | Distributions   | Variable Type |
|---------------------------------------|-----------------|---------------|
| Number of defective units per x units | Poisson         | Discrete      |
| Time factor                           | Exponential     | Continuous    |
| Number of success per x trials        | Binomial        | Discrete      |
| Sampling without replacement          | Hyper-geometric | Discrete      |

</center>

Many times, statistical distribution can be used to establish important baselines for the estimation. 

**Now, resumed to the test of Variance**

Test of Variance
========================================================

Detecting whether the variance (or $\sigma^2$) of the process has changed provide important information
- About the machine condition
- About the accuracy of of the mean estimate
  
<center>
![Variance](https://www.qualitydigest.com/june08/Images/SimplifyingSPC/SimplifyingSPCFig7.gif)
</center>

***

<center>

<img src="Confidence Interval PowerPoint-figure/unnamed-chunk-6-1.png" title="plot of chunk unnamed-chunk-6" alt="plot of chunk unnamed-chunk-6" height="700" />

</center>


Test of Variance - Formalization (1/2) 
========================================================



**One group variance testing statistics** 

$$ \chi^2 = \frac{(n-1)s^2}{\sigma_0^2}$$

It follows a $\chi^2$ distribution with $n-1$ degree of freedom.

***

<center>


Shape of $\chi^2$ Distribution

![chi-sq](https://saylordotorg.github.io/text_introductory-statistics/section_15/5a0c7bbacb4242555e8a85c9767c03ee.jpg)

$\chi^2$ distribution is useful when determining whether an observed pattern follows the expected pattern. 

</center>

Test of Variance - Formalization (2/2) 
========================================================


**Two group variance testing statistics**

$$F = \frac{s_1^2}{s_1^2}$$

It follows a $F$ distribution with $n_1 - 1$ degree of freedom of numerator and  $n_2 - 1$ degree of freedom of denominator.

$F$ distribution is a distribution of ratio. 

***

<center>
Shape of $F$ Distribution

![F](https://upload.wikimedia.org/wikipedia/commons/thumb/7/74/F-distribution_pdf.svg/1200px-F-distribution_pdf.svg.png)

</center>


Test of Variance - Excel Solutions
========================================================






![F](https://www.teststeststests.com/microsoft-office/excel-2016/tutorials/13-excel-data-analysis-toolpak/1-t-test-F-test-z-test/6-F-Test-inExcel.gif)

*** 
Excel Formulae:

$=F.Dist(F, DF1, DF2, TRUE)$
$=F.Dist.RT(F, DF1, DF2)$
$=F.Test(Range1, Range2)$



One More Application of Confidence Interval
========================================================
Sometimes, for budgetary reason, we need to calculate the size of the sample befor we conduct a sampling. See if you can answer these two questions. 

**Example:** <br>
A manager wants to ensure that whenever he rejects a shipment, he does not want to make more than 5% of mistake. Historically, the supplier's process had a very stable standard deviation of .5 mm. He believes 0.02 mm could serve as a meaningful margin of error size. What would be his choice of sample size? That is, 

$$ 0.02 = 1.96 * \frac{.5}{\sqrt{n}}$$
Q: What is this n? 
<br><br>
**Solution:** To get the sample size: 
$$ 0.02 = 1.96 * \frac{.5}{\sqrt{n}}$$
$$ \sqrt{n}= (\frac{1.96 * .5}{.02})^2 = 2401$$

Sample Size Solution
========================================================