# Introduction

Artificial neural networks (ANNs) are computational models that are loosely inspired by their biological counterparts. Artificial neurons, which are elementary units in an ANN, appears to have been first introduced by McCulloch and Pitts in 1943 to demonstrate the biological neurons using algorithms and mathematics. An ANN is based on a set of connected artificial neurons. Each connection, like the synapses in a biological brain, can transmit a signal from one artificial neuron to another. Since the early work of Hebb, who proposed cell assembly theory which attempted to explain synaptic plasticity, considerable effort has been devoted to ANNs. Though perceptron was created by Rosenblatt for pattern recognition by training an adaptive McCulloch-Pitts neuron model, it is incapable of processing the exclusive-or circuit. A key trigger for renewed interest in neural networks and learning was Werbos’s back-propagation algorithm, which makes the training of multi-layer networks feasible and efficient.

# Methods

Multi-Layered Perceptron(MLP) gains prestige as a feedforward neural network, which has been the most widely used neural networks. Error Back Propagation(BP) algorithm is implemented as a training method for MLPs.

## Multi-Layered Perceptron

MLP, which is a fully-connected feedforward neural network, uses a supervised learning methods. An MLP consists of an input layer, an output layer and at least one hidden layer. The fundamental structure of MLP is shown schematically in Fig. 1:

Considering the input layer, neurons take effect on passing the information, i.e. the activation functions of this layer are identity functions. Each neurons, who consist the input layer, completely connect with the first hidden layer, thus those neurons hold multiple outputs. By giving $M$ neurons in input layer and $N$ neurons in hidden layer, each connection is assigned a weight $v_{nm}$. A bias term added to total weighted sum of inputs to serve as threshold to shift the activation function.The propagation function computes the hidden layer input $X_j$ to the neuron $x(j)$ from the outputs of predecessor neurons $Z_j$ has the form

where $\theta$ donates the bias term.

The activation function, which defines the output of the node, were chosen as “Hyperbolic tangent” and “Softsign” in this case. Equations, plot and derivative are given by TABLE 1.

MLPs’ scheme essentially has three features:

• There are no connections between the neurons which are in the layers, and neurons shall not have connections with themselves.
• Full connections exist only between the nearby layers.
• It consists of two aspects: the feed-forward transmission of information and the feed-back transmission of error.

## Error Back Propagation

Back Propagation is a method used in ANNs to calculate a gradient that is needed in the calculation of the weights between layers, and it is commonly used by the gradient descent optimization algorithm. Calculating adjustments of weights in Fig. 1 is chosen as an example to introduce BP algorithm. We represent the parameter $\delta$ with respect to $i$-th layer by the notation $\delta^{(i)}$.

For calculating the weights between hidden layer and output layer, loss function is defined as:

where $\widehat{y}_{qj}$ is expected output, $y_{qj}$ is actual output. Based on gradient descent algorithm with learning rate $\mu$, which controls how much algorithm adjusting the weights of neuron network with respect the loss gradient, and the input of activation function $s_j$, we have

with

and we have

where $\delta_j$ donates the error of the output layer, thus, the adjustment of $w$ shall be:

Consider calculating the weights between input layer and output layer:

with

and

where

thus

In summary, the equations describing the adjustment of weights is:

where $\delta^{(l)}_j$ for output layer is:

for hidden layer and input layer is

respectively.

The process of training MLP with BP algorithm is shown in Algorithm 1.

# Results

In this section, we have designed different model structures for addressing curve fitting problems.

## Network Design

We designed two different structures, which are shown in Fig. 2.

The structure which has one hidden layer with 600 nodes was applied to fitting $f(x) = sin(x)$ and $f(x_1, x_2) = \frac{sin(x_1)}{x_1} \frac{sin(x_2)}{x_2}$. For $y = |sin(x)|$, we chosen the structure with 2 hidden layers to reach a better regression performance.

## $f(x) = sin(x)$

We trained the layers with 9 sample, using $E = \frac{1}{2} (\widehat{y} - y)^2$ as the loss function. The training process will stop when $E_{avg} = (\sum_{i = 1}^{epoch} E_i) /epoch < 2\times 10^{-4}$. The results with 361 test samples are shown in Fig. 3:

Runtime loss $E$ and average loss $E_{avg}$ are shown in Fig. 4, where the testing value evaluated by the average loss function is $9.3599\times 10^{-5}$.

We notice that the loss suffers a drastic drop within $50,000$ epochs, then, to decrease the average loss $E_{avg}$ from $0.1$ to $0.002$, more than $1\times 10^6$ epochs are needed.

The absolute error is shown in Fig. 5:

Apparently, max error between actual value and predict value is less than 0.05, which shows the prestige performance when fitting $f(x) = sin(x)$.

## $f(x) = |sin(x)|$

When it comes to a more complex nonlinear curve, the fitting error using elementary MLP structure which only has one hidden layer is far from satisfactory. A hidden layer with softsign activation function was added to the basic MLP structure. By implementing the structure with 9 training sample, we arrive at the following results:

Fig. 7 shows the loss decreased over training epochs:

From the figure above, we can conclude that though the loss suffers the chattering within 500,000 epochs, while the average loss obtains a smooth constant rate of descent. Though the training loss which is less than $1\times 10^{-5}$ after $3\times 10^6$ epochs yields the guaranteed performance on training samples, as we can observed in Fig. 6, the test error(more than 0.2) is unbearable for some test samples. We assumed lacking of train samples has brought about the overfitting of neural network, thus we expanded the training sample with 29 independent (x, y) to reach the better performance.

When average loss is less than 0.005, we obtained the following result, as shown in Fig. 8:

Apparently, $y = |sin(x)|$ has a non-differentiable point in $(0, 2\pi)$. From Fig. 9, the point which is non-differentiable has the maximal error compared to other points. i.e. providing more training sample can not reinforce the ability of non-linear fitting for a certain MLP model.

Still, adding training sample improves the performance of prediction on test sets. From Fig. 9, though we have a test point which posses the error more than 0.1, absolute errors of other test points are less than 0.02. i.e. this approach provides more robust solutions than 9 training sample with the same loss tolerant. The training loss of this approach is shown in Fig. 10

## $f(x, y) = \frac{sin(x)}{x}\frac{sin(y)}{y}$

In this section, let us consider a function which has two inputs. $11\times 11$ samples are given for training with the basic MLP structure. The hidden layer has 500 nodes, and the training will stop when $E_{avg} < 0.002$. Fig. 11 shows the actual curve and the curve predicted with $21\times 21$ samples respectively.

We plot $E$ and $E_{avg}$, varying $epoch$ from $[1, 100,000]$ in Fig. 12, from which we observe $E_{avg}$ decrease rapidly from 1.1 to 0.1 within 1000 epochs. Thus, the training process shows the efficiency and robust performance of back propagation algorithm.

The test error is shown as Fig. 13, from which we observe the maximal training error is less than 0.15.

# Conclusions

In this report, we implemented multilayer perceptron to provide the solutions to three specific non-linear functions. Simulation results illustrates the accuracy of artificial neural networks and the efficiency of back propagation algorithm.