This article provides a comprehensive explanation of classical parametric system identification — the approach of fitting polynomial input-output models to measured data using the prediction error method (PEM). It covers the four major model structures (ARX, ARMAX, Output-Error, Box-Jenkins), the prediction error framework for parameter estimation, model order selection, and the connection to subspace identification and the author's research. Related articles, research papers, and MATLAB links are placed at the bottom.

Author: Hiroshi Okajima, Associate Professor, Kumamoto University, Japan — 20 years of control engineering research

For the overview of system identification methods, see the hub article: System Identification: From Data to Dynamical Models — A Comprehensive Guide

Why Classical Parametric Methods Matter

Classical parametric methods are the most widely used and longest-established approach to system identification. They form the foundation of the field as described in the landmark textbook by Ljung (1999) and are implemented in every major system identification toolbox.

The basic idea is simple and intuitive:

Assume a model structure — a parameterized relationship between past inputs, past outputs, and noise.
Define a prediction error — the difference between the measured output and the output predicted by the model.
Minimize the prediction error over the dataset to find the best parameter values.

This approach is powerful because:

It provides a unified framework for estimating models of varying complexity, from simple ARX to the fully general Box-Jenkins structure.
For the simplest case (ARX), estimation reduces to ordinary least squares — no iterative optimization is needed.
For more complex structures, the prediction error method (PEM) provides statistically efficient estimates that coincide with maximum likelihood under Gaussian noise.
The estimated model is in a form that is directly usable for transfer function analysis, controller design, and simulation.
MATLAB provides mature, well-tested implementations (arx, armax, oe, bj, pem) in the System Identification Toolbox.

This article covers both the model structures and the estimation algorithm (PEM) in a single unified treatment, because they are inseparable: the choice of model structure determines the form of the prediction error, and PEM is the algorithm that optimizes the parameters for any given structure.

Problem Setting

Consider a discrete-time single-input single-output (SISO) system. The measured output $y(k)$ is generated by an input $u(k)$ and corrupted by noise $e(k)$ :

$\displaystyle y(k) = G(q, \theta)\,u(k) + H(q, \theta)\,e(k)$

where $G(q, \theta)$ is the plant transfer function, $H(q, \theta)$ is the noise transfer function, $q$ is the forward shift operator ( $qu(k) = u(k+1)$ ), and $\theta$ is the parameter vector to be estimated. The noise $e(k)$ is assumed to be a zero-mean white noise sequence.

The goal of parametric identification is to estimate $\theta$ from the measured data ${u(k), y(k)}_{k=1}^{N}$ .

Model Structures: From ARX to Box-Jenkins

The four classical polynomial model structures differ in how they parameterize $G(q, \theta)$ and $H(q, \theta)$ . They form a natural hierarchy from simple to general.

ARX (Auto-Regressive with eXogenous Input)

The ARX model is the simplest structure:

$\displaystyle A(q)y(k) = B(q)u(k-n_k) + e(k)$

where:

$\displaystyle A(q) = 1 + a_1 q^{-1} + a_2 q^{-2} + \cdots + a_{n_a} q^{-n_a}$

$\displaystyle B(q) = b_1 + b_2 q^{-1} + \cdots + b_{n_b} q^{-n_b+1}$

and $n_{k}$ is the input delay (number of samples). The plant and noise transfer functions are:

$\displaystyle G = \frac{B(q)}{A(q)}, \quad H = \frac{1}{A(q)}$

The key feature of ARX is that the plant and noise share the same denominator $A(q)$ . This coupling has a crucial computational consequence: the prediction error is linear in the parameters, so estimation reduces to ordinary least squares (OLS) — a non-iterative, globally optimal computation.

However, the shared denominator also means that the noise model affects the plant model and vice versa. If the true noise dynamics differ from $1/A(q)$ , the ARX estimate of the plant may be biased.

ARMAX (Auto-Regressive Moving Average with eXogenous Input)

ARMAX adds an independent polynomial for the noise numerator:

$\displaystyle A(q)y(k) = B(q)u(k-n_k) + C(q)e(k)$

where $C(q) = 1 + c_{1}q^{-1} + \cdots + c_{n_c}q^{-n_c}$ . The transfer functions are:

$\displaystyle G = \frac{B(q)}{A(q)}, \quad H = \frac{C(q)}{A(q)}$

ARMAX provides more flexibility in modeling the noise characteristics compared to ARX, while the plant and noise still share the denominator $A(q)$ . The estimation requires iterative optimization (PEM), because the prediction error is nonlinear in the $C(q)$ parameters.

Output-Error (OE)

The Output-Error model separates the plant dynamics completely from the noise:

$\displaystyle y(k) = \frac{B(q)}{F(q)}u(k-n_k) + e(k)$

where $F(q) = 1 + f_{1}q^{-1} + \cdots + f_{n_f}q^{-n_f}$ . The transfer functions are:

$\displaystyle G = \frac{B(q)}{F(q)}, \quad H = 1$

OE focuses entirely on modeling the deterministic (plant) dynamics — the noise is simply white noise added to the output. This is appropriate when the primary goal is to obtain an accurate plant model and the noise characteristics are not of interest.

Box-Jenkins (BJ)

The Box-Jenkins model is the most general structure, where both the plant and noise have independent numerators and denominators:

$\displaystyle y(k) = \frac{B(q)}{F(q)}u(k-n_k) + \frac{C(q)}{D(q)}e(k)$

where $D(q) = 1 + d_{1}q^{-1} + \cdots + d_{n_d}q^{-n_d}$ . The transfer functions are:

$\displaystyle G = \frac{B(q)}{F(q)}, \quad H = \frac{C(q)}{D(q)}$

BJ offers complete freedom in modeling both the plant and noise dynamics independently. All other structures can be viewed as special cases:

ARX: $F = A$ , $C = 1$ , $D = 1$
ARMAX: $F = A$ , $D = 1$
OE: $C = 1$ , $D = 1$

Model Structure Hierarchy

The relationship between the four structures can be visualized as:

                    Box-Jenkins (BJ)
                   G = B/F,  H = C/D
                  /                  \
          ARMAX                    OE
       G = B/A, H = C/A        G = B/F, H = 1
          |
         ARX
      G = B/A, H = 1/A

Moving from ARX toward BJ increases the model flexibility but also increases the number of parameters and the difficulty of estimation (from least squares to iterative optimization).

The Prediction Error Method (PEM)

The Prediction Error

Given a model parameterized by $\theta$ , the one-step-ahead prediction of the output at time $k$ is:

$\displaystyle \hat{y}(k|\theta) = H^{-1}(q, \theta)\,G(q, \theta)\,u(k) + \bigl(1 - H^{-1}(q, \theta)\bigr)\,y(k)$

This prediction uses the model to filter the input $u(k)$ and the past measured outputs $y(k-1), y(k-2), \ldots$ . The prediction error is:

$\displaystyle \varepsilon(k, \theta) = y(k) - \hat{y}(k|\theta)$

For the ARX model, this simplifies to:

$\displaystyle \varepsilon(k, \theta) = A(q)\,y(k) - B(q)\,u(k-n_k)$

which is linear in the parameters $\theta = (a_{1}, \ldots, a_{n_a}, b_{1}, \ldots, b_{n_b})^{T}$ .

The Cost Function

PEM estimates $\theta$ by minimizing the sum of squared prediction errors:

$\displaystyle V_N(\theta) = \frac{1}{N} \sum_{k=1}^{N} \varepsilon(k, \theta)^2$

The parameter estimate is:

$\displaystyle \hat{\theta}_N = \arg\min_{\theta} V_N(\theta)$

Optimization

For ARX models, $V_{N}(\theta)$ is a quadratic function of $\theta$ , so the minimum is found by solving a linear system of equations (ordinary least squares). This is computationally cheap and globally optimal.

For ARMAX, OE, and BJ models, $V_{N}(\theta)$ is a nonlinear function of $\theta$ , requiring iterative optimization. The standard algorithms used in practice are:

Gauss-Newton method — Approximates the Hessian using the Jacobian of the prediction errors. This is the most commonly used method and is the default in MATLAB.
Levenberg-Marquardt method — A damped version of Gauss-Newton that provides better convergence when the initial guess is far from the optimum.

Both methods require an initial parameter estimate. A common strategy is to first estimate an ARX model (which is globally optimal) and use its parameters to initialize the iterative optimization for more complex structures.

Statistical Properties

Under mild regularity conditions, the PEM estimate is:

Consistent: $\hat{\theta}_{N} \to \theta_{0}$ as $N \to \infty$ , where $\theta_{0}$ is the true parameter vector (assuming the true system belongs to the model class).
Asymptotically efficient: The estimate achieves the Cramér-Rao lower bound for the parameter covariance.
Equivalent to maximum likelihood under Gaussian noise: When $e(k)$ is Gaussian, minimizing $V_{N}(\theta)$ is equivalent to maximizing the likelihood function.

Local Minima

For nonlinear models (ARMAX, OE, BJ), the cost function $V_{N}(\theta)$ may have local minima. The iterative optimization may converge to a local minimum rather than the global one, depending on the initial parameter estimate. Strategies to mitigate this include:

Initializing with an ARX or subspace identification (N4SID) estimate.
Running the optimization from multiple initial points.
Using MATLAB's ssest function, which combines n4sid initialization with PEM refinement automatically.

Model Order Selection

Choosing the correct model orders ( $n_{a}, n_{b}, n_{c}, n_{d}, n_{f}, n_{k}$ ) is essential for obtaining a good model. Underfitting (orders too low) leads to bias, while overfitting (orders too high) leads to high variance.

Information Criteria

Akaike Information Criterion (AIC) balances model fit against complexity:

$\displaystyle \mathrm{AIC} = N \ln V_N(\hat{\theta}) + 2d$

where $d$ is the number of estimated parameters. Bayesian Information Criterion (BIC) applies a stronger penalty for complexity:

$\displaystyle \mathrm{BIC} = N \ln V_N(\hat{\theta}) + d \ln N$

The model with the lowest AIC or BIC is preferred.

Residual Analysis

After estimation, the prediction errors (residuals) should be white noise — uncorrelated with each other and uncorrelated with the input. MATLAB's resid function performs both tests:

Autocorrelation test: The autocorrelation of the residuals should be approximately zero for all non-zero lags.
Cross-correlation test: The cross-correlation between the residuals and the input should be approximately zero for all lags.

If the residuals show systematic patterns, the model structure or order should be revised.

Cross-Validation

A practical approach is to split the data into estimation and validation sets. The model is estimated on the first set and evaluated on the second. MATLAB's compare function compares the model's predicted or simulated output against the validation data, reporting a fit percentage.

Comparison with Subspace Identification

Classical parametric methods and subspace identification (N4SID, MOESP, CVA) represent two fundamentally different approaches. The following table summarizes the key differences.

Aspect	Parametric (PEM)	Subspace (N4SID)
Model output	Transfer function (polynomial)	State-space matrices
Model structure	Must be specified a priori	Not required
Order selection	AIC / BIC / residual analysis	Singular value gap
Estimation method	Iterative optimization (except ARX)	Non-iterative (SVD + LS)
Local minima	Risk for ARMAX, OE, BJ	None
MIMO systems	Increasingly complex	Naturally handled
Noise model	Integral part of structure	Optional
Statistical efficiency	Optimal (Cramér-Rao bound)	Generally suboptimal
Initialization	ARX or N4SID	Not needed

The ssest Function: Combining Both Approaches

MATLAB's ssest function bridges the two approaches: it uses n4sid (subspace identification) to obtain an initial state-space model, then refines it using PEM to minimize the prediction error. This combination provides the robustness of subspace methods (good initialization, no local minima risk) with the statistical optimality of PEM (Cramér-Rao efficiency).

% ssest = n4sid initialization + PEM refinement
sys = ssest(data, n);   % Estimate n-th order state-space model

This is particularly relevant for the author's research on cyclic reformulation-based identification, where the cycled system is first identified by N4SID (subspace step), and the result can optionally be refined by PEM for improved accuracy.

For a detailed treatment of subspace identification, see the blog article: Subspace Identification Methods: N4SID, MOESP, and CVA Explained

MATLAB Implementation

Estimating ARX Models

%% Data preparation
data = iddata(y, u, Ts);

%% ARX estimation
na = 3; nb = 2; nk = 1;       % Model orders and delay
sys_arx = arx(data, [na nb nk]);

%% Examine the result
compare(data, sys_arx);        % Compare with measured data
resid(data, sys_arx);          % Residual analysis

Estimating ARMAX, OE, and BJ Models

%% ARMAX
sys_armax = armax(data, [na nb nc nk]);

%% Output-Error
sys_oe = oe(data, [nb nf nk]);

%% Box-Jenkins
sys_bj = bj(data, [nb nc nd nf nk]);

Using PEM for General Models

The pem function can refine any model by minimizing the prediction error:

%% Refine an existing model using PEM
sys_refined = pem(data, sys_initial);

%% Estimate a state-space model with PEM (initialized by n4sid)
sys_ss = ssest(data, n);

Model Order Selection in MATLAB

%% Try a range of ARX orders using AIC
nn = struc(1:5, 1:5, 1:3);       % Test na=1..5, nb=1..5, nk=1..3
V = arxstruc(data_est, data_val, nn);
order = selstruc(V, 'aic');        % Select best order by AIC
sys_best = arx(data, order);

Complete Workflow Example

%% 1. Load data
data = iddata(y, u, Ts);
data_est = data(1:500);            % Estimation set
data_val = data(501:end);          % Validation set

%% 2. Start with ARX (globally optimal, no local minima)
sys_arx = arx(data_est, [3 2 1]);

%% 3. Try ARMAX for better noise modeling
sys_armax = armax(data_est, [3 2 2 1]);

%% 4. Try BJ for independent plant/noise modeling
sys_bj = bj(data_est, [2 2 2 2 1]);

%% 5. Compare all models on validation data
compare(data_val, sys_arx, sys_armax, sys_bj);

%% 6. Check residuals of the best model
resid(data_val, sys_bj);

Connection to the Author's Research

PEM and Model Error Compensation

In practice, every identified model contains errors due to finite data, noise, model structure limitations, and unmodeled dynamics. The Model Error Compensator (MEC) provides a systematic way to handle these errors: it uses the difference between the model output and the plant output to compensate for model inaccuracies, without modifying the existing controller.

The combination of PEM identification + MEC is a practical workflow:

Identify a plant model using PEM (or subspace methods).
Design a controller based on the identified model.
Add MEC as an add-on compensator that corrects for modeling errors online.

This workflow is controller-independent — MEC works with PID, MPC, state feedback, or any other control law.

PEM and Subspace Identification in the Cyclic Reformulation Pipeline

In the author's cyclic reformulation approach for LPTV and multirate systems, the identification pipeline is:

Step 1 (Cyclic reformulation): Construct cycled signals from multirate/LPTV data

Step 2 (Subspace identification): Apply N4SID to the cycled signals

Step 3 (Coordinate transformation): Recover original parameters

The subspace method (N4SID) in Step 2 can optionally be followed by PEM refinement using ssest or pem, combining the initialization robustness of N4SID with the statistical optimality of PEM.

For the cyclic reformulation approach, see: Cyclic Reformulation-Based System Identification for Periodically Time-Varying Systems

For multirate system identification, see: System Identification Under Multirate Sensing Environments

Key References

1. Ljung, System Identification: Theory for the User, 2nd ed., Prentice Hall, 1999.
1. Söderström and P. Stoica, System Identification, Prentice Hall, 1989.
MATLAB System Identification Toolbox documentation: arx, armax, oe, bj, pem, ssest
GitHub (System identification basics): MATLAB system identification

Self-Introduction

Hiroshi Okajima — Associate Professor, Graduate School of Science and Technology, Kumamoto University. Member of SICE, ISCIE, and IEEE.

www.control-theory.com / Blog / YouTube / GitHub

If you found this article helpful, please consider bookmarking or sharing it.

SystemIdentification #PredictionErrorMethod #ARX #ARMAX #BoxJenkins #PEM #MATLAB #ControlEngineering #TransferFunction #ParameterEstimation