Model Fitting

Some cases are too complex to describle with mathmatical formulas, so we need to analyze data points. Here are the tasks:

Fitting a selected model type or types to the data.
Choosing the best fitted model.
Making predicitons.

Relationship Between Model Fitting and Interpolation

Model Fitting involves finding a mathmatical function that best represents the given data. It doesn’t necessarily pass through all the data points. Interpolation involves finding a function that exactly passes through all the data points.

Sources of Error in the Modeling Process

Formulation error Formulation error presents in all the models when certain variables are negligible or simplifed in the assumption.
Truncation error Truncation error occors when numerical methods used to solve mathmatical problems cause inacurated errors.
Round-off error Round-off error happens when infinit digit cannot be represented exactly using finit digits.
Measurement error Measurement error is caused by inaccurate data collaction.

3.1 Fitting Models to Data Graphically

Models need as many data points in the model curve. The spacing of the data points within that interval is also important. Accuracy of the data points need to be appraised everytime before fitiing the model.

Visual Model Fitting with the Original Data

Generally, when more than two data points exist, some points are not lied on a same line. There will be vertical distances between inaccurate points and the line. We call these vertical distances absolute deviations. For the best-fitiing line, we aim to minimize the sum of these absolute deviations. However, some large absolute deviations will impact the line a lot. A line that minimizes the largest deviation from any point is an alternative.

Transforming the Data

Sometimes the relationships are complex, thus the graphs are difficult to draw. In order to draw simpler graphs, we can put not only x ot y on the x-axis or y-axis, but also functions or relations about x and y. This need some more mathematical processes but will come up with simpler results, such as straight lines.

3.2 Analytic Methods of Model Fitting

Chebyshev Approximation Criterion

Chebyshe Approximation Criterion is used to minimize the largest absolute deviation. Given a function $y = f (x)$ and a collection of $n$ data points $(x_{i}, y_{i})$ . Minimizes the number

Maximum ∣ y_{i} - f (x_{i}) ∣ i = 1, 2, ..., n

Redidual is difference between true value and measured value. Note that rediduals can be positive or negative, but absolute deviations are always positive. $r = x_{i} - m_{i} r = Redidual, x = True Value, m = Measured Value$ For every data point, there is a residual $r_{i}$ . The largest absolute value among all $r_{i}$ is called $r$ , and we aim to minimize $r$ . The $r$ needs to meet:

∣ r_{i} ∣ \leq r - r \leq r_{i} \leq r

$∣ r_{i} ∣ \leq r$ can be expressed by $r - r_{i} \leq 0$ and $r + r_{i} \geq 0$ .

r - (x_{i} - m_{i}) \geq 0 (r - r_{i} \geq 0) r + (x_{i} - m_{i}) \geq 0 (r + r_{i} \geq 0)

The weakness of Chebyshev Approximation Criterion is that it gives more weight to a single point with large deviation.

Minimizing the Sum of the Absolute Deviations

Given a function $y = f (x)$ and a collection of $n$ data points $(x_{i}, y_{i})$ . Then we want to minimize

i = 1 \sum n ∣ y_{i} - f (x_{i})∣

$R_{i} = ∣ y_{i} - f (x) ∣, i = 1, 2, ..., n$ represent each absolute deviation. We want to minimize by adding together the number $R_{i}$ .

Least Squares Criterion

Given a function $y = f (x)$ and a collection of $n$ data points $(x_{i}, y_{i})$ . Then we want to minimize

i = 1 \sum n ∣ y_{i} - f (x_{i}) ∣^{2}

$R_{i} = ∣ y_{i} - f (x) ∣, i = 1, 2, ..., n$ . Consider $R_{i}$ are scalar components of a deviation vector $R = R_{1} i + R_{2} j + R_{3} k$ . In other word, we want to make vector, $R$ , as short as possible. The formula to calculate this deviation vector is

∣ R ∣ = R_{1}^{2} + R_{2}^{2} + R_{3}^{2}

In order to minimize $∣ R ∣$ , we can minimize $∣ R ∣^{2}$ .

Relating the Criteria

Combining Chebyshev Approximation Criterion and Least Squares Criterion.

The absolute deciations according to Chebyshev Approximation Criterion are: $c_{i} = ∣ y_{i} - f (x_{i}) ∣, i = 1, 2, ..., n$ Define the largest absolute deviations is $c_{ma x}$ , which we want to minimize.

The absolute deciations according to Least Squares Criterion are:

d_{i} = ∣ y_{i} - f (x_{i}) ∣, i = 1, 2, ..., n

Define the largest absolute deviations is $d_{ma x}$ . Note:

d_{ma x} \leq c_{ma x}

According to the definition of Least Squares Criterion, the sum of $d_{i}$ is the smallest, so:

d_{1}^{2} + d_{2}^{2} + ... + d_{n}^{2} \leq c_{1}^{2} + c_{2}^{2} + ... + c_{n}^{2}

Since $c_{i} < c_{ma x}$ , therefore,

d_{1}^{2} + d_{2}^{2} + ... + d_{n}^{2} \leq n c_{ma x}^{2}

Isolate $c_{ma x}$ by following formula:

c_{ma x} \geq \frac{d _{1}^{2} + d _{2}^{2} + ... + d _{n}^{2}}{n}

For ease of discussion, define

D = \frac{d _{1}^{2} + d _{2}^{2} + ... + d _{n}^{2}}{n}

Thus,

D \leq c_{max} \leq d_{max} $$If $|D - d_{max}|$ is too large, Chebyshev Approximation Criterion will be better than Least Squares Criterion.

Quartz 4

Explorer