Sally Fitzgibbons Foundation

Beginning the Academic Essay

Fundamental inputs

One of the goals of the study is to determine if fundamental inputs play a useful role in
network predictive capability. Previous research undertaken into fundamental indexation by
Arnott et al. (2005) and ANNs with fundamental inputs by Ellis and Wilson (2005) and
Kryzanowski et al. (1992) has been used to assist in the selection of ANN fundamental
inputs. Other fundamental inputs were also included in an attempt to achieve maximum
probability of specifying an accurate model. All fundamental input data was obtained on a
Daily, weekly and monthly basis. As most of the fundamental inputs are calculated by
reference to company balance sheets
The following fundamental inputs were used as network inputs:
? Price
Price inputs included open, high, low, and close prices.
o The Opening Price:
The price or an operation of selling or purchasing has done in a certain period of time or a
price of the first operation when the daily data of a specific indicator is being analyzed.
o High:
The highest price the indicator reaches in a certain period of time. It may be a year, a day or
even an hour. It refers to the point at which the vendors are more than the purchasers.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

o Low:
222

The lowest price the indicator reaches in a certain period of time. It may be a year or a day.
o Close:
The closing price at which the price closes one day, or the final price the indicator reaches in
a certain period of time.
? Volume
The number of shares traded during the week.

7.1.4. Technical indicators:

? MACD
Moving average convergence divergence (MACD) is a trend-following momentum indicator
that shows the relationship between two moving averages of prices. The MACD is calculated
by subtracting the 26-day exponential moving average (EMA) from the 12-day EMA. A
nine-day EMA of the MACD, called the “signal line”, is then plotted on top of the MACD,
functioning as a trigger for buy and sell signals.

? Momentum
Momentum is the rate of acceleration of a security’s price or volume. In technical analysis,
momentum is considered an oscillator and is used to help identify trend lines.
In general, momentum refers to the force or speed of movement; it is usually defined as a
rate. In the world of investments, momentum refers to the rate of change on price movements
for a particular asset – that is, the speed at which the price is changing.

? RSI
The relative strength index (RSI) is a technical momentum indicator that compares the
magnitude of recent gains to recent losses in an attempt to
determine overbought and oversold conditions of an asset. It is calculated using the following
formula:
RSI = 100 – 100/ (1 + RS*) Equation (65) – RSI General Equation
*Where RS = Average of x days’ up closes / Average of x days’ down closes.

223

7.3. Data pre-processing
All the input and output data that belonged to neural system network was pre-handled and
pre-processed. A few pre-handling techniques were embraced keeping in mind the end goal
to guarantee that the neural system adapted rapidly and gave better execution.

The first pre-processing algorithm:
Processed input and target data by deleting unnecessary columns from the data file and just
keep the columns of date, open, low, high, close, volume, rows that contain constant values.
Constant values do not provide the neural network with any

1. The second pre-processing algorithm:
Check the length of file that has been downloaded, it must have at least 100 data rows,
which means at least 100 days for daily input, 100 weeks for weekly input and 100 months
for monthly input to give the ANN chance to train will and predict the price with minimum
percent of ERROR.

2. The third pre-processing algorithm:
Add four columns at the end of data file and use them as an input with the fundamental input
that are already found on the file (open, low, high, close, volume), Calculate technical
indicators (Momentum, RSI, MACD and MACD Signal), so the total inputs that were used
are nine inputs.
The first algorithm processed input and target data by deleting any rows that contain constant
values. Constant values do not provide the neural network with any

3. The fourth pre-processing algorithm:
Reduce and scale each trend output price between minimum and maximum values by using
log-sigmoid function and scale the trend output between to 0 (Down) and +1 (Up). There are
two main reasons for scaling all network input data to between (0) and 1 (or sometimes -1
and 1).
The reason for pre-processing neural network input Trend data to values between 0 ; 1 is
due to the way sigmoid functions squash the input data. The hyperbolic tangent transfer or

224

log-sigmoid transfer function used squashes all neuron activation levels to values between -1
and +1 or 0 and 1. However, if large positive or negative inputs are utilized without preprocessing,
these
values
end
up saturating
the function
(i.e.
becoming
so close
to the limit
of
the
function
that due
to rounding,
they
effectively
reach
the
function
limit).
For
example,

when
utilizing
a
hyperbolic
tan function
once
an
input
value
reaches
10;
its
hyperbolic

tangent
equals
0.999999995.
Therefore,
in
order
to
keep
input
values
within
a meaningful

range,
the data
must
be
pre-processed.

4. The fifth pre-processing algorithm:
Eliminate the first row from data file which contains header and not count it in the training
process.
5. The sixth pre-processing algorithm:
Check the file format if it was not in standard form which had (date, open, low, high, close,
volume), it will be refused.

The scaling algorithm used on all input series meant that the network output was also scaled.
Therefore, upon completion of the neural network calculations, a reverse scaling algorithm
was then run on the output data to effectively un-scale the output.

7.4. Frequency of data

There is constrained direction on the optimal frequency of used input data. Kaastra and Boyd
(1996) note that:

The data frequency relies on upon the objectives and targets of the researcher. The ideal offfloor
dealer
in the
stock or
commodity
and
item
futures
markets
would
no doubt
utilize
daily

data
if the
neural
network
as
a part
of a
general
exchanging
and
trading
system.
A financial

specialist
with a more
extended
term
horizon
may
utilize
monthly
or
weekly
data
to form
an

input
to
the
neural
system…
(p.220)

Hall
(1994)
reported
accomplishment
in
utilizing
weekly
data
to perform
an
input
to
the

neural
system
network
that
beat
benchmark
index.
Hall
(1994)
outlines
the issues
and
the

225

problems connected with deciding the correct predictive and prescient time of individual
neural systems network:

The developmental way of the stock and sharing selection issue or problem and the
probability of unmanageable external events cause the long-term forecasting inconsistent and
unreliable. The great size and enormous amount of noise that be found in this kind of
network additionally makes short time expectations temperamental and unreliable.

There is some future period for which a good forecasts and prediction can be performed.
Finding the best possible predictive period is a basic and crucial task according to modelling
developments, complex NLD which means non-direct dynamic systems for issues, for the
stock choosing and selecting problems, an estimate of the predictive and prescient period
must be made by experimentation over a time of many stages or cycles… For relations and
connections of this complexity, there is no genuine ideal. The main technique for finding the
best possible prescient and predictive period is by using a frequencies and repeating
simulation of the whole network system, including simulated exchanging, over a
comprehensive time periods. (p.60)

For the reasons belonged to this examination and research, the weekly data has been utilized
for inputs to anticipate the price of the stock at t + 4 (the stock’s price in four weeks’ time
periods).
Training, testing and validation sets

To actualize and illustrate the artificial neural system network, the data set should be
partitioned into three groups which are: group of training set, group of validation set and the
last group are testing set. A few authors in the field credit and satisfy the implications and
conflict between the terms validation and testing data sets (cf Kaastra and Boyd, 1996) and
Beale, Hagan and Demuth, 2011). To evacuate and remove all uncertainty, which is
performed in this research, each of the data set divisions is characterized reliably with (Beale
et al., 2011):

226

– the set of data utilized for calculating and upgrading the system weights
and the biases.

– the set of data utilized for checking the mistake and error amid training.
Normally amid training, the error on the acceptance and validation set of data declines until
over-fitting begins to happen and soon thereafter the error begins to increment once more.
System learning parameters are set to guarantee that over-fitting does not happen.

– the set of data that is out of the sample test utilized when system training is
finished to test how exact the system is, and in this manner, to look at system performance.

A good choosing and determination of the good training data window is vital and important
in light of the fact that, “If the base training window is too long the model will be moderate
and slow to react to state changes, but If the training window is too short, the model may
blow up attends to noise” (Hall, 1994, p.63).

While a static framework can be effectively demonstrated by utilizing a random data division
Tanique, time series expectation needs the utilization of moving window testing. Moving
window testing has been portrayed as takes after (Kaastra and Boyd, 1996):

Walk forward testing includes partitioning and dividing the data into a progression of
validation and acceptance sets. Every set is moved and advanced forward through the time
series.
Walk-forward testing endeavors to recreate and simulate real life exchanging and tests the
heartiness and robustness of the model through its continuous retraining on an expansive out
of sample data set. In walk forward testing, the amount and the size of acceptance and
validation set cause the retraining frequency of the neural system network. The frequent
retraining takes more time, whereas, it permits the system to adjust and adapt faster to
changing economic situations. (p.223)

227

To examine how the walk forward testing functions. The main window which is window#1
partitions the first piece of data into training, validation and testing. To examine the system
moreover into the time series, the system then makes another training, validation and testing
set. The walk forward methodology is preceded until the end of the craved testing series of
time.

Walk forward methodology was used for all testing. Determination of the right training
window length is basic as the training window goes about as the “memory” for the system
and the system recognize the connections between input variables and output according to
the patterns contained in the training window’s data (Deboeck, 1994). The literature gives
little direction either as far as hypothesis or heuristics for the right determination of training
periods. Hall (1994) demonstrates that:

“Commonly the training window stays fixed and stable as it slides and moves through time.
At the point when the training data window is long, the network is better ready to separate
amongst noise and the true system reaction. Consequently, the precision and accuracy of the
system frequently enhances as the training window is expanded and increased, on the
contrary, if the present condition and state of the network stays consistent and stable, the base
and minimum length of the historical training window decides how the model will react amid
these state changes. On the off chance that the base and minimum training window is too
long, the model will be moderate and slow to react to state changes. Since a great part of the
chance for beating the market sector which happens right on time in the state changes, the
viability and effectiveness of the total exchanging system will be low. But if the training
window is too short, the model may go overboard to noise. It might likewise go overboard
and overact to the fluctuation that typically happens amid the state changes. In actuality, the
model gets to be insecure and unstable amid these periods. A financial specialist who reacts
and response to an insecure and unstable model will be fluctuated, rapidly exchanging
forward and backward between two sorts of portfolios at precisely the wrong time. Once
more, the viability and efficiency of the total exchanging system will be low”. (P.62-63)

228

To decide the suitable length of the preparation window, a few systems networks were made
for every stock so that the accompanying training periods were tried and tested:
Neural network paradigm/structure:

The ANN structure adopted consisted of 3-layer network which comprises the input layer, a
hidden layer, and the output layer. This system is a feed-forward with propagation network.
A schematic diagram of the neural network adopted is shown in Figure (7.1) Schematic
diagram of artificial neural network implementation. The fully feed-forward structure was
determined based on the findings of two studies. The first was a comparative study by Hallas
and Dorffner (1998) that looked at the use of feed-forward networks and recurrent networks
for the prediction of 8 different types of non-linear time series data. This study compared the
performance of three difference types of feed-forward networks and seven types of recurrent
networks at predicting five different non-linear time series data. Hallas and Dorffner (1998)
concluded that the feed-forward networks generally performed better and that their study
pointed to “serious limitations of recurrent neural networks applied to nonlinear prediction
tasks” (p.647). The other study by Dematos, Boyd, Kermanshahi, Kohzadi and Kaastra
(1996) compared the results of using recurrent and feed-forward neural networks to predict
the Japanese yen/US dollar exchange rate. This study also concluded that despite the relative
simplicity of the feed-forward network structure, the feed-forward network outperformed the
recurrent network
(Dematos et al., 1996).

For each of the stocks and portfolios in the data set there are 9 inputs. 5 of the inputs are
fundamental inputs and 4 of the inputs are technical indicators. The network was configured
so that during the learning period, the network would try to predict the price of each stock in
231

four weeks’ time. For example, at the beginning of the learning period (t=0), all 9 inputs
(including the stock price) were read into the network. The network then simulated what the
stock price would be not less than 100 day which means 24 weeks’ time (t >= 24). The
network then compared the simulated price to the real price. Learning rate adjusted to
be=0.3, the resulting error must be so small less than delta and was then back propagated
through the network and the process was repeated using the next input vector.

7.7. Number of hidden layers

The use of hidden layers is what gives artificial neural networks the ability to generalize
(Kaastra & Boyd, 1996). From a theoretical perspective, a back-propagation network with a
sufficient number of hidden neurons is able to accurately model any continuous function. The
use of multiple hidden layers can improve network performance up to a point; however, the
use of too many hidden layers increases computation time and leads to over fitting (Kaastra
& Boyd, 1996). Kaastra and Boyd (1996) have explained how neural networks over fit as
follows:

“Overfitting occurs when a forecasting model has too few degrees of freedom.
Clearly, it’s been noticed that the model has few noticing in the relation between its
parameter, and it is also, has the ability to recognize the several points instead of
recognize and learn the common pattern. In the example of ANN, the number of
neuron weights which is connected to the neurons in the hidden layer and the size of
training set which is known as observation number, find the similarity of overfitting.
The greatest number of weights related and relative to the training set’s size, the
greatest ability and accuracy of the network to memorize and recognize the nature of
individual recognition and observation, this cause the generalization of validation set
is lost and disappear and this model is little used in real predicting” (p.225)

A network that suffers from over fitting is generally considered non-useful. (Hall, 1994):

“In general, for evolutionary, complex NLD non-linear dynamic systems, the
accuracy of a model in explaining some local condition is inversely proportional to its
usefulness in discovering and explaining future states.” (p.57)

In the absence of an established theoretical framework for selecting the number of hidden
layers, the heuristic of using a maximum of two hidden layers is generally considered
232

appropriate (Kaastra & Boyd, 1996). For the purposes of this research, a single hidden layer
has been utilized.

7.8. Number of hidden neurons

There is no established method for selecting the optimum number of neurons for the hidden
layer. Previous research in the area has relied on testing to determine the optimum number
for the particular network in question (Tilakaratne et al., 2008). They also, added (Yoon &
Swales, 1991), “the accuracy and performance became better and developed since the
number unit belonged to the hidden also increased up to a certain point…Increasing the
number of hidden units beyond this point produced no further improvement but impaired the
network’s performance” (P.491).
Some heuristics have been proposed by previous researchers (refer Table 7 Heuristics to
estimate hidden layer size).

Neural network paradigm/structure:

The ANN structure adopted consisted of 3-layer network which comprises the input layer, a
hidden layer, and the output layer. This system is a feed-forward with propagation network.
A schematic diagram of the neural network adopted is shown in Figure (7.1) Schematic
diagram of artificial neural network implementation. The fully feed-forward structure was
determined based on the findings of two studies. The first was a comparative study by Hallas
and Dorffner (1998) that looked at the use of feed-forward networks and recurrent networks
for the prediction of 8 different types of non-linear time series data. This study compared the
performance of three difference types of feed-forward networks and seven types of recurrent
networks at predicting five different non-linear time series data. Hallas and Dorffner (1998)
concluded that the feed-forward networks generally performed better and that their study
pointed to “serious limitations of recurrent neural networks applied to nonlinear prediction
tasks” (p.647). The other study by Dematos, Boyd, Kermanshahi, Kohzadi and Kaastra
(1996) compared the results of using recurrent and feed-forward neural networks to predict
the Japanese yen/US dollar exchange rate. This study also concluded that despite the relative
simplicity of the feed-forward network structure, the feed-forward network outperformed the
recurrent network
(Dematos et al., 1996).

For each of the stocks and portfolios in the data set there are 9 inputs. 5 of the inputs are
fundamental inputs and 4 of the inputs are technical indicators. The network was configured
so that during the learning period, the network would try to predict the price of each stock in
231

four weeks’ time. For example, at the beginning of the learning period (t=0), all 9 inputs
(including the stock price) were read into the network. The network then simulated what the
stock price would be not less than 100 day which means 24 weeks’ time (t >= 24). The
network then compared the simulated price to the real price. Learning rate adjusted to
be=0.3, the resulting error must be so small less than delta and was then back propagated
through the network and the process was repeated using the next input vector.

7.7. Number of hidden layers

The use of hidden layers is what gives artificial neural networks the ability to generalize
(Kaastra & Boyd, 1996). From a theoretical perspective, a back-propagation network with a
sufficient number of hidden neurons is able to accurately model any continuous function. The
use of multiple hidden layers can improve network performance up to a point; however, the
use of too many hidden layers increases computation time and leads to over fitting (Kaastra
& Boyd, 1996). Kaastra and Boyd (1996) have explained how neural networks over fit as
follows:

“Overfitting occurs when a forecasting model has too few degrees of freedom.
Clearly, it’s been noticed that the model has few noticing in the relation between its
parameter, and it is also, has the ability to recognize the several points instead of
recognize and learn the common pattern. In the example of ANN, the number of
neuron weights which is connected to the neurons in the hidden layer and the size of
training set which is known as observation number, find the similarity of overfitting.
The greatest number of weights related and relative to the training set’s size, the
greatest ability and accuracy of the network to memorize and recognize the nature of
individual recognition and observation, this cause the generalization of validation set
is lost and disappear and this model is little used in real predicting” (p.225)

A network that suffers from over fitting is generally considered non-useful. (Hall, 1994):

“In general, for evolutionary, complex NLD non-linear dynamic systems, the
accuracy of a model in explaining some local condition is inversely proportional to its
usefulness in discovering and explaining future states.” (p.57)

In the absence of an established theoretical framework for selecting the number of hidden
layers, the heuristic of using a maximum of two hidden layers is generally considered
232

appropriate (Kaastra & Boyd, 1996). For the purposes of this research, a single hidden layer
has been utilized.

7.8. Number of hidden neurons

There is no established method for selecting the optimum number of neurons for the hidden
layer. Previous research in the area has relied on testing to determine the optimum number
for the particular network in question (Tilakaratne et al., 2008). They also, added (Yoon &
Swales, 1991), “the accuracy and performance became better and developed since the
number unit belonged to the hidden also increased up to a certain point…Increasing the
number of hidden units beyond this point produced no further improvement but impaired the
network’s performance” (P.491).
Some heuristics have been proposed by previous researchers (refer Table 7 Heuristics to
estimate hidden layer size).

Post Author: admin

x

Hi!
I'm Dana!

Would you like to get a custom essay? How about receiving a customized one?

Check it out