The main goal of this project is to conduct a thorough time series analysis of monthly export data for the United States, spanning 23 years of historical records to uncover underlying patterns in monthly export trends. The objective is to understand how these patterns evolve over time, identify recurring seasonal fluctuations, long-term trends, and any irregularities that may impact export performance. The analysis encompasses U.S. international export monthly data from 2000 to 2023, employing five distinct models:
- A regression model featuring linear trend and seasonality.
- A two-level forecasting model combining regression with linear trend and seasonality, and trailing MA (k=6) for residuals.
- Another two-level forecasting model integrating regression with linear trend and seasonality, along with order-5 auto-regressive (AR5) for residuals.
- Holt-Winter’s model.
- Auto-ARIMA model.

The data is collected from U.S Census Bureau and is available in "Us_Monthly_exports_2000-2023.xlsx" file.
The data plots illustrate the international exports over a 23-year period from 2000 to 2023. The time series trend exhibits a mix of upward and downward trends. A significant decrease in exports is observed during the 2008-2009 period, attributed to the Great Recession, and a similar pattern is noted between 2019 and 2020 due to the impact of COVID-19. However, despite these downturns, the overall trend shows a consistent upward trajectory.
From the autocorrelation chart above we see that the data is highly correlated, as the autocorrelation coefficients in all the lags are substantially higher than the horizontal threshold (significantly greater than zero). For all the lags, the autocorrelation coefficients have a positive value. A positive autocorrelation coefficient in lag 1 is substantially higher than the horizontal threshold, which is indicative of an upward trend component. A positive autocorrelation coefficient in lag 12, which is also statistically significant, points to a seasonal component being present in the data. It can be concluded that the data is not just comprised of a level component.
The primary objective of this analysis is to conduct a z-test using the AR(1) model to assess the significance of the autocorrelation at lag 1. The calculated z-statistic was -0.8571429, resulting in a p-value of 0.195683. Since the p-value exceeds the chosen significance level (p-value > 0.05), the decision is made to accept the null hypothesis. Further investigation is done using the autocorrelation in first-differenced data, utilizing the Acf() function in R. The autocorrelation chart for the first differencing data is shown below.
Although not all autocorrelation coefficients of the first differenced data are statistically significant, only the lag-1 autocorrelation is significant. Nevertheless, it is still valuable to explore different models as it is not a random walk.
The dataset is divided into two parts – Training and Validation. The training is used to train the forecasting models and the set consists of 240 records from the period of January 2000 to December 2019. The validation set is used to validate the performance of the forecasting models and has 48 records from the period of January 2020 to December 2023.
The summary of the regression model with linear trend and seasonality for the training set is shown below.
The linear regression model has 12 predictors- 1trend + 11 dummy variables for seasonality feb (season2) to dec (season12). All the seasonal variables are statistically insignificant. The intercept of the model is 53999.583. The model has a R-squared of 0.884 and adj. R_squared of 0.8779. The regression model equation is:
yt = 53999.583 + 393.286 t + 371.214 D2 + ………-71.649 D12
Two-Level Forecast- Regression model with linear trend and seasonality + Trailing MA (k=6) for residuals
Trailing MA with a window width of 6 is trained using the above linear regression model residuals. The below plot shows regression residuals and trailing MA residuals for training and validation partition.
The table containing validation partition data (Exports), regression forecast (Regression.Fst), MA forecast for regression residuals (MA.Residuals.Fst), and combined (2-level) forecast (Combined.Fst(MA1)) that combines the two previous forecasts is shown below:
The autocorrelation chart (correlogram) of the residuals from the regression model with linear trend and seasonality is provided below.
The plot indicates notable autocorrelation among the residuals across lag intervals 1 to 12. This suggests that the regression model does not account for these autocorrelations among the residuals. Consequently, by incorporating these residual autocorrelations using an autoregressive (AR) model and implementing a two-level forecasting approach, the forecast could potentially be enhanced. The correlogram for the residuals of the AR(1), AR(2), AR(3), AR(4) and AR(5) model (residuals of residuals) are shown below
The correlogram reveals that the autocorrelations for the AR(5) model's residuals appear to be random, indicating that the AR(5) model has successfully captured significant autocorrelation at all lags. Consequently, integrating the AR(5) model for residuals with the regression model could enhance the forecasting of the time series. The summary of the AR(5) model for the regression residuals is shown below
The AR(5) model’s equation is:
et = -39.2819 + 0.9528 et-1 + 0.1910 et-2 - 0.0024 et-3 - 0.0628 et-4 - 0.1194 et-5
The below plot shows regression residuals and AR(5) residuals for training and validation partition.
The table containing validation partition data (Exports), regression forecast (Regression.Fst), AR(5) forecast for regression residuals (AR(5).Residuals.Fst), and combined (2-level) forecast (Combined.Fst(AR5)) that combines the two forecasts is shown below:
The summary of the Holt-Winter’s (HW) model with automated selection of error, trend and seasonality options, and automated selection of smoothing parameters for the training partition is shown below.
The HW model has the (M,Ad,N) options which indicates multiplicative error, additive trend, and no seasonality. The optimal value for exponential smoothing constant (alpha) is 0.7249, the smoothing constant for trend estimate (beta) is 0.3264, and damping constant (phi) is 0.8.
The summary of the auto-ARIMA model is for the training period is shown below.
This is a non-seasonal ARIMA model, (2,1,2), with the following parameters: • p = 2, order-2 autoregressive model AR(2) • d = 1, first differencing • q = 2, order 2 moving average MA(2) for error lags The ARIMA model’s equation is:
yt - yt-1 = 1.1970(yt-1 -yt-2) – 0.5232(yt-2 -yt-3) – 1.2335εt-1 + 0.7166εt-2 + 311.4698
In ARIMA (AutoRegressive Integrated Moving Average) models, drift(311.4698) refers to a constant term added to the model to account for long-term trends or shifts in the data that are not captured by the autoregressive or moving average components. In a basic ARIMA model, the drift parameter is denoted as "d" and is typically included in models with the integrated component (the "I" in ARIMA), which represents differencing to make the time series stationary. The drift term allows the model to capture linear trends in the data.
The 5 models are trained on entire dataset and the table below shows the forecast of the five models in the future of 2024
The performance measures of the models along with base-line models on entire dataset is shown below
Based on above table the best performing models are:
- Regression with linear trend and seasonality + AR(5)
- Holt’s Winter model
- Auto-ARIMA model
After thorough analysis of both Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE) values, the Regression model with linear trend and seasonality combined with autoregressive component of order 5 for residuals emerges as the optimal choice among the evaluated models for U.S. international exports analysis. Although the MAPE for this model slightly surpasses that of Auto-ARIMA (1.96% compared to 1.953%), its RMSE significantly outperforms Auto-ARIMA, standing at 3302.002 compared to 3461.523. Therefore, the Regression model with linear trend and seasonality combined with AR(5) for residuals is deemed the best model. Additionally, this model exhibits lower MAPE and RMSE values compared to baseline models, further supporting its superiority for the analysis of U.S. international exports
- Sai Harsha Vardhan Reddy, Kolan- skolan@horizon.csueastbay.edu, harsha62334@gmail.com
Thanks for reading!