Time Series & Stats Glossary

Master the vocabulary of data science. Each term explained with intuition, math, and practical context.

The Vibe

Your data's personality doesn't change over time. The mean, variance, and autocorrelation stay consistent whether you look at January or July.

R Code

library(tseries)

# Run ADF test for stationarity
adf.test(my_data$Close)

# If p-value > 0.05, data is non-stationary
# Apply differencing to make it stationary
stationary_data <- diff(my_data$Close)

Why It Matters

Most time series models assume stationarity. If your data has trends or changing variance, predictions will be unreliable. You need to difference or transform non-stationary data first.

Visual

Stationary

Non-Stationary

The Vibe

The general direction your data is heading over time. Is it going up, down, or staying flat?

R Code

library(stats)

# Decompose time series into components
decomposed <- decompose(ts(my_data$Close, frequency = 12))

# Extract the trend component
trend <- decomposed$trend
plot(trend, main = "Trend Component")

Why It Matters

Identifying trends helps you understand long-term patterns. Removing trends (via differencing) is often the first step to making data stationary.

The Vibe

Repeating patterns at regular intervals. Like how ice cream sales spike every summer or how traffic increases every Monday morning.

R Code

# Create seasonal time series (monthly data)
ts_data <- ts(my_data$Close, frequency = 12, start = c(2024, 1))

# Decompose to extract seasonal component
decomposed <- decompose(ts_data)
seasonal <- decomposed$seasonal

# Plot seasonal pattern
plot(seasonal, main = "Seasonal Component")

Why It Matters

Ignoring seasonality leads to poor forecasts. Models like SARIMA explicitly handle seasonal patterns to improve accuracy.

The Vibe

How much today's value is related to values from previous days. It's like asking 'does yesterday predict today?'

R Code

# Plot Autocorrelation Function
acf(my_data$Close, main = "ACF Plot")

# Get numeric ACF values for first 10 lags
acf_values <- acf(my_data$Close, plot = FALSE)
print(acf_values$acf[1:10])

# Significant spikes beyond blue lines = useful lags

Why It Matters

ACF plots help you identify the 'q' parameter in ARIMA models. Significant spikes at certain lags reveal the structure of your data.

Visual

Sample ACF Plot

The Vibe

Instead of looking at actual values, you look at the change between consecutive values. It's like checking your daily weight change instead of total weight.

R Code

# First-order differencing (d = 1)
diff_1 <- diff(my_data$Close, differences = 1)

# Second-order differencing (d = 2)
diff_2 <- diff(my_data$Close, differences = 2)

# Plot original vs differenced
par(mfrow = c(2, 1))
plot(my_data$Close, type = "l", main = "Original")
plot(diff_1, type = "l", main = "First Difference")

Why It Matters

Differencing removes trends and makes data stationary. The 'd' in ARIMA represents how many times you difference your data.

The Vibe

A time shift. Lag-1 means 'one time period ago'. If you're looking at daily data, lag-7 means 'a week ago'.

R Code

# Create lagged versions of your data
library(dplyr)

df <- data.frame(
  value = my_data$Close,
  lag_1 = lag(my_data$Close, 1),  # Yesterday
  lag_7 = lag(my_data$Close, 7)   # One week ago
)

# Lag plot to visualize autocorrelation
lag.plot(my_data$Close, lags = 4)

Why It Matters

Lags are the building blocks of time series analysis. AR models use lagged values as predictors, and lag plots help visualize dependencies.

The Vibe

Predicting today based on yesterday (and maybe the day before). It's regression, but using your own past values instead of other variables.

R Code

library(forecast)

# Fit AR(2) model - predicting from 2 past values
ar_model <- Arima(my_data$Close, order = c(2, 0, 0))

# View AR coefficients (phi values)
print(ar_model$coef)

# The "p" in ARIMA(p, d, q) is the AR order

Why It Matters

The 'p' in ARIMA. AR components capture how past values influence the present. PACF helps you determine how many AR terms to include.

The Vibe

Predicting today based on past prediction errors. If you overshot yesterday, adjust today accordingly.

R Code

library(forecast)

# Fit MA(2) model - using 2 past errors
ma_model <- Arima(my_data$Close, order = c(0, 0, 2))

# View MA coefficients (theta values)
print(ma_model$coef)

# The "q" in ARIMA(p, d, q) is the MA order

Why It Matters

The 'q' in ARIMA. MA components smooth out noise and capture short-term dependencies. ACF helps you determine how many MA terms to include.

The Vibe

The Swiss Army knife of time series. Combines AutoRegressive (past values), Integrated (differencing), and Moving Average (past errors).

R Code

library(forecast)

# Fit ARIMA(1, 1, 1) model
arima_model <- Arima(my_data$Close, order = c(1, 1, 1))

# Or let R choose the best parameters
auto_model <- auto.arima(my_data$Close)

# Forecast next 30 periods
forecast_result <- forecast(auto_model, h = 30)
plot(forecast_result)

Why It Matters

ARIMA is the go-to model for univariate time series forecasting. Master ARIMA and you can tackle most forecasting problems.

Visual

ARIMA(p, d, q) Components

AR Order

Differencing

MA Order

The Vibe

A statistical test that asks: 'Is this data stationary?' It gives you a p-value to make the call.

R Code

library(tseries)

# Run Augmented Dickey-Fuller test
adf_result <- adf.test(my_data$Close)

# Check the p-value
print(adf_result$p.value)

# If p-value > 0.05: Non-stationary, increase d!
# If p-value < 0.05: Stationary, good to go!

Why It Matters

Before fitting ARIMA, you need to know if differencing is required. ADF test with p-value > 0.05 means you should increase 'd'.

The Vibe

Like ACF, but it removes the influence of intermediate lags. It shows the direct relationship between today and k days ago.

R Code

# Plot Partial Autocorrelation Function
pacf(my_data$Close, main = "PACF Plot")

# Get numeric PACF values
pacf_values <- pacf(my_data$Close, plot = FALSE)
print(pacf_values$acf[1:10])

# PACF cuts off sharply at lag p
# This tells you the AR order (p) for ARIMA

Why It Matters

PACF is your guide for choosing 'p' (AR order). Where PACF cuts off sharply is often the right 'p' value.

Visual

Sample PACF Plot (cuts off at lag 2)

The Vibe

Using your model to predict future values. The further out you go, the less confident you should be.

R Code

library(forecast)

# Fit model and forecast 30 periods ahead
model <- auto.arima(my_data$Close)
fc <- forecast(model, h = 30)

# Plot with confidence intervals
plot(fc, main = "30-Period Forecast")

# Access point forecasts and intervals
print(fc$mean)      # Point forecasts
print(fc$lower)     # Lower bounds (80% & 95%)
print(fc$upper)     # Upper bounds (80% & 95%)

Why It Matters

The whole point! Good forecasts drive business decisions. Always report confidence intervals to show uncertainty.

The Vibe

Comparing group averages to see if at least one is different. Like asking 'Do these treatments actually have different effects?'

R Code

# Create sample data with 3 groups
group_A <- c(45, 48, 42, 47, 44)
group_B <- c(62, 65, 58, 61, 63)
group_C <- c(38, 35, 40, 37, 39)

# Run one-way ANOVA
data <- data.frame(
  value = c(group_A, group_B, group_C),
  group = factor(rep(c("A", "B", "C"), each = 5))
)
anova_result <- aov(value ~ group, data = data)
summary(anova_result)

Why It Matters

When you have 3+ groups to compare, ANOVA tells you if the differences are real or just random noise. It's the gateway to understanding experimental results.

Visual

Comparing Group Means

x̄ = 45

x̄ = 62

x̄ = 38

The Vibe

The probability of seeing your data (or more extreme) if the null hypothesis were true. Small p-value = 'something interesting is happening'.

R Code

# T-test example
group1 <- c(23, 25, 28, 22, 24)
group2 <- c(30, 32, 29, 31, 33)
result <- t.test(group1, group2)

# Extract p-value
print(result$p.value)

# Common thresholds:
# p < 0.05  -> Statistically significant
# p < 0.01  -> Highly significant
# p < 0.001 -> Very highly significant

Why It Matters

P-values guide decision-making. Below 0.05 typically means 'statistically significant', but context matters more than arbitrary thresholds.

The Vibe

A range of plausible values for your estimate. '95% confident the true value is between X and Y'.

R Code

# Sample data
data <- c(23, 25, 28, 22, 24, 26, 27, 25, 24, 26)

# Calculate 95% confidence interval for the mean
result <- t.test(data, conf.level = 0.95)
print(result$conf.int)

# Or manually:
mean_val <- mean(data)
se <- sd(data) / sqrt(length(data))
ci_lower <- mean_val - 1.96 * se
ci_upper <- mean_val + 1.96 * se

Why It Matters

Point estimates are incomplete. Confidence intervals communicate uncertainty and help you understand how precise your estimates really are.

The Vibe

How spread out your data is. High variance = wild swings. Low variance = stable and predictable.

R Code

# Sample data
data <- c(23, 25, 28, 22, 24, 26, 27, 25, 24, 26)

# Variance (sample variance with n-1)
sample_var <- var(data)
print(sample_var)

# Standard deviation (sqrt of variance)
std_dev <- sd(data)
print(std_dev)

# Population variance (n instead of n-1)
pop_var <- var(data) * (length(data) - 1) / length(data)

Why It Matters

Variance appears everywhere in statistics. It's in confidence intervals, hypothesis tests, and model assumptions. Understanding spread is fundamental.

The Vibe

The classic average. Add everything up and divide by how many you have.

R Code

# Sample data
data <- c(23, 25, 28, 22, 24, 150)  # Note the outlier!

# Calculate mean
mean_val <- mean(data)
print(mean_val)  # Skewed by outlier

# Compare with median (robust to outliers)
median_val <- median(data)
print(median_val)

# Trimmed mean (removes extreme values)
trimmed_mean <- mean(data, trim = 0.1)

Why It Matters

The most common measure of central tendency. But beware: outliers can skew the mean dramatically. Consider the median for robust analysis.

The Vibe

Drawing the best-fit line through your data. Predicting Y using X.

R Code

# Simple linear regression
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(2.1, 4.3, 5.8, 8.1, 9.9, 12.2, 13.8, 16.1, 18.0, 20.2)

# Fit the model: Y = b0 + b1*X
model <- lm(y ~ x)
summary(model)

# Extract coefficients
intercept <- coef(model)[1]  # b0
slope <- coef(model)[2]      # b1

# Predict new values
predict(model, newdata = data.frame(x = c(11, 12)))

Why It Matters

Regression is the workhorse of predictive modeling. Time series models are essentially regression with past values as predictors.

Time Series & Stats Glossary

Stationarity

Trend

Seasonality

Autocorrelation (ACF)

Differencing

Lag

AR (AutoRegressive)

MA (Moving Average)

ARIMA

ADF Test

PACF (Partial ACF)

Forecast

ANOVA

P-Value

Confidence Interval

Variance

Mean (Average)

Regression