# Statistics Topics

**Alpha Risk**- The probability of rejecting a null hypothesis when it is true. It is the probability of making a Type I error.
**Arithmetic Mean**- The arithmetic mean of n numbers is the sum of the numbers divided by n.
**Autocovariance**- This is the degree to which a function is correlated with itself as a function of time.
**Autoregressive Moving Average Model**- Essentially an all-pole infinite impulse response filter with some additional interpretation placed on it.
**Average Deviation**- The absolute value of the difference from the mean for each data value, summed, then divided by the number of values.
**Bernoulli Trial**- An experiment with only two possible outcomes, such as success or failure, heads or tails, or good or bad.
**Beta Distribution**- A distribution used for continuous random variables which are constrained to lie between 0 and 1.
**Beta Risk**- The probability of not rejecting a null hypothesis when it is false. It is the probability of making a Type II error.
**Binomial Distribution**- A distribution which gives the probability of observing successes in a fixed number of independent Bernoulli trials.
**Capability Indices**- Indices computed in the process capability procedure to measure how well a sample of data conforms to process specifications.
**Class Boundary**- A point that is the left endpoint of one class interval, and the right endpoint of another class interval.
**Class Interval**- The non-overlapping intervals of a histogram.
**Complementary Probability**- Considering probabilites in decimal form, the sum of two probabilites equal to one.
**Concordant**- A pair of cases for two ordered data variables in which values for the first case are either both higher or both lower than the values of the variables for the second case.
**Conditional Probability**- The probability of an event occurring given that another event also occurs.
**Confidence Interval**- A range of values which is believed to include, with a preassigned degree of confidence, the true characteristic of the lot or universe a given percentage of the time.
**Confidence Level**- The degree of desired trust or assurance in a given result.
**Confidence Limits**- The upper and lower extremes of the confidence interval.
**Continuous Data**- The values belonging to it may take on any value within a finite or infinite interval.
**Controlled Experiment**- An experiment that uses the method of comparison to evaluate the effect of a treatment by comparing treated subjects with a control group, who do not receive the treatment.
**Covariance**- A measure of the joint variability of a pair of numeric variables.
**Cumulative Probability**- The probability that a random variable will be less than or equal to a specified value.
**Data**- A series of facts or statements that may have been collected, stored, processed or manipulated but have not been organized.
**Data Mining**- Using automated data anlysis techniques to find themes or relationships.
**Data Processing**- The execution of a systematic sequence of operations performed upon data. Synonymous with information processing.
**Deciles**- The 10th, 20th, 30th, ...90th percentile points.
**Discordant**- A pair of cases for two ordered data variables in which the value of one variable for the first case is higher or lower than its value in the second case, and the relative relationship is switched for the second variable.
**Distribution**- A probability function which describes the relative frequency of occurrence of data values when sampled from a population.
**Double Blind Experiment**- Neither the subjects nor the people evaluating the subjects knows who is in the treatment group and who is in the control group.
**End Point Convention**- In histograms, you need to decide where to count values that are on the exact boundary between two intervals: either in the left or in the right interval.
**Experimental Probability**- The chances of something happening, based on repeated testing and observing results.
**Frequency Distribution**- An organized display of a set of data that shows how often each different piece of data occurs.
**Gaussian Curve**- Normal Distribution.
**Gaussian Distribution**- A continuous probability distribution that often gives a good description of data that cluster around the mean.
**Geometric Mean**- A statistic calculated by multiplying n data values together and taking the n-th root of the result.
**Harmonic Mean**- The harmonic mean of two numbers a and b is 2ab/(a + b).
**Histogram**- A graphical display showing the distribution of data values in a sample by dividing the range of the data into non-overlapping intervals and counting the number of values which fall into each interval.
**Inter-Quartile Range**- For a list of numbers this is the upper quartile minus the lower quartile.
**Joint Probability**- The probability of two or more events happening at the same time.
**Law of Averages**- The average of independent observations of random variables that have the same probability distribution is increasingly likely to be close to the expected value of the random variables as the number of observations grows.
**Law of Large Numbers**- In repeated, independent trials with the same probability p of success in each trial, the percentage of successes is increasingly likely to be close to the chance of success as the number of trials increases.
**Least Squares**- Any statistical procedure that involves minimizing the sum of squared differences.
**Lower Quartile**- The 25th percentile, calculated by ordering the data from smallest to largest and finding the value which lies 25% of the way up through the data.
**Maximum Likelihood Estimate**- The most accurate maximum likelihood estimate is, by definition, the mode of a data set.
**Mean**- The sum of all values in the data, divided by the number of values.
**Mean Time to Failure**- The measured operating time of a system or component divided by the number of failures that occurred during that time.
**Mode**- The most frequently occurring value in a sequence of numbers.
**Multimodal Distribution**- A distribution with more than one mode.
**Negative Binomial Distribution**- A discrete probability distribution useful for characterizing the time between Bernoulli trials. Sometimes called the Pascal distribution.
**Normal Distribution**- A continuous probability distribution that often gives a good description of data that cluster around the mean. Also known as Gaussian Distributon or Bell Curve.
**Odds**- A statement of the probabilities that an event will or will not happen.
**Ordinal Data**- A set of data is said to be ordinal if the values belonging to it can be put in order or have a rating scale attached.
**Pareto Distribution**- A distribution used for random variables which are constrained to be greater or equal to 0.
**Partial Correlation**- A measure of the strength of the relationship between two or more numeric variables having accounted for their joint relationship with one or more additional variables.
**Pascal Distribution**- A discrete probability distribution useful for characterizing the time between Bernoulli trials.
**Poisson Distribution**- A distribution often used to express probabilities concerning the number of events per unit.
**Population**- The total number of unique values.
**Probability**- A number between 0 and 1 which represents how likely an event is to occur.
**Process Capability**- A measurable property of a process to the specification.
**Pure Error**- Variability between observations made at the same values of the independent variable or variables.
**Quartiles**- Statistics which divide the observations in a numeric sample into 4 intervals, each containing 25% of the data.
**Random Experiment**- An experiment or trial whose outcome is not perfectly predictable, but for which the long-run relative frequency of outcomes of different types in repeated trials is predictable.
**Randomised Controlled Experiment**- A controlled experiment in which the assignment of subjects to the treatment group or control group is done at random, eg by drawing straws.
**Rayleigh Distribution**- An example is the variation of wave height in a sea where swell is the main component.
**Relative Standard Deviation**- A measure of precision, calculated by dividing the standard deviation for a series of measurements by the average measurement.
**Residual**- The observed value minus the predicted value.
**Residual Plot**- A plot of the residuals from the regression against the explanatory variable.
**Sample**- A set of observations, usually considered to have been taken from a much larger population.
**Sample Size**- The number of elements in a sample from a population.
**Sample Survey**- A survey based on the responses of a sample of individuals, rather than the entire population.
**Skewed Distribution**- A distribution that is not symmetrical.
**Spatial Sampling**- Sampling in two or more dimensions.
**SPC**- Abbreviation of Statistical Process Control.
**Standard Deviation**- Standard deviation is the square root of the variance.
**Standard Error**- The standard deviation divided by the square root of the number of data values.
**Standard Normal Distribution**- A normal distribution with a mean equal to 0 and a standard deviation equal to 1
**Statistic**- Anything that can be calculated from a sample of data.
**Statistical Model**- A statistical model is used to describe the relationship between a dependent variable Y and one or more independent variables.
**Statistical Process Control**- Statistical techniques to measure and analyse the extent to which a process deviates from a set standard.
**Statistics Books**- Lists all Statistics Books in the Encyclopaedia
**Statistics Calculations**- Lists all Statistics Calculations in the Encyclopaedia
**Statistics Conversions**- Lists all Statistics Conversions in the Encyclopaedia
**Statistics Weblinks**- Lists all Statistics Weblinks in the Encyclopaedia
**Student′s t Distribution**- A probability distribution which is very similar in shape to the standard normal distribution.
**Summary Statistics**- A single number representation of the characteristics of a set of data. Usually given by measures of central tendency and measures of dispersion.
**Target Population**- The entire group a researcher is interested in, the group about which the researcher wishes to draw conclusions.
**Theoretical Probability**- The chances of events happening as determined by calculating results that would occur under ideal circumstances.
**Time Series**- A sample of data values collected at equally spaced points in time.
**t test**- A hypothesis test based on Student′s t distribution.
**Type I Error**- Incorrectly rejecting a true null hypothesis. The probability of such an error is the alpha risk.
**Type II Error**- Not rejecting a false null hypothesis. The probability of such an error is called the beta risk.
**Unbiased**- Having no bias.
**Uncontrolled Experiment**- An experiment in which there is no control group.
**Upper Bound**- A plausible upper limit to the true value of a quantity, usually not a true statistical confidence limit.
**Upper Quartile**- The 75th percentile, calculated by ordering the data from smallest to largest and finding the value which lies 75% of the way up through the data.
**Variance**- The square of the difference from the mean for each data value, summed and divided by one less than the number of values.
**Weibull Distribution**- A distribution used for random variables which are constrained to be greater or equal to 0.

**Subjects: ** Mathematics