Fundamental inputs

One of the goals of the study is to determine if fundamental inputs play a useful role in

network predictive capability. Previous research undertaken into fundamental indexation by

Arnott et al. (2005) and ANNs with fundamental inputs by Ellis and Wilson (2005) and

Kryzanowski et al. (1992) has been used to assist in the selection of ANN fundamental

inputs. Other fundamental inputs were also included in an attempt to achieve maximum

probability of specifying an accurate model. All fundamental input data was obtained on a

Daily, weekly and monthly basis. As most of the fundamental inputs are calculated by

reference to company balance sheets

The following fundamental inputs were used as network inputs:

? Price

Price inputs included open, high, low, and close prices.

o The Opening Price:

The price or an operation of selling or purchasing has done in a certain period of time or a

price of the first operation when the daily data of a specific indicator is being analyzed.

o High:

The highest price the indicator reaches in a certain period of time. It may be a year, a day or

even an hour. It refers to the point at which the vendors are more than the purchasers.

o Low:

222

The lowest price the indicator reaches in a certain period of time. It may be a year or a day.

o Close:

The closing price at which the price closes one day, or the final price the indicator reaches in

a certain period of time.

? Volume

The number of shares traded during the week.

7.1.4. Technical indicators:

? MACD

Moving average convergence divergence (MACD) is a trend-following momentum indicator

that shows the relationship between two moving averages of prices. The MACD is calculated

by subtracting the 26-day exponential moving average (EMA) from the 12-day EMA. A

nine-day EMA of the MACD, called the “signal line”, is then plotted on top of the MACD,

functioning as a trigger for buy and sell signals.

? Momentum

Momentum is the rate of acceleration of a security’s price or volume. In technical analysis,

momentum is considered an oscillator and is used to help identify trend lines.

In general, momentum refers to the force or speed of movement; it is usually defined as a

rate. In the world of investments, momentum refers to the rate of change on price movements

for a particular asset – that is, the speed at which the price is changing.

? RSI

The relative strength index (RSI) is a technical momentum indicator that compares the

magnitude of recent gains to recent losses in an attempt to

determine overbought and oversold conditions of an asset. It is calculated using the following

formula:

RSI = 100 – 100/ (1 + RS*) Equation (65) – RSI General Equation

*Where RS = Average of x days’ up closes / Average of x days’ down closes.

223

7.3. Data pre-processing

All the input and output data that belonged to neural system network was pre-handled and

pre-processed. A few pre-handling techniques were embraced keeping in mind the end goal

to guarantee that the neural system adapted rapidly and gave better execution.

The first pre-processing algorithm:

Processed input and target data by deleting unnecessary columns from the data file and just

keep the columns of date, open, low, high, close, volume, rows that contain constant values.

Constant values do not provide the neural network with any

1. The second pre-processing algorithm:

Check the length of file that has been downloaded, it must have at least 100 data rows,

which means at least 100 days for daily input, 100 weeks for weekly input and 100 months

for monthly input to give the ANN chance to train will and predict the price with minimum

percent of ERROR.

2. The third pre-processing algorithm:

Add four columns at the end of data file and use them as an input with the fundamental input

that are already found on the file (open, low, high, close, volume), Calculate technical

indicators (Momentum, RSI, MACD and MACD Signal), so the total inputs that were used

are nine inputs.

The first algorithm processed input and target data by deleting any rows that contain constant

values. Constant values do not provide the neural network with any

3. The fourth pre-processing algorithm:

Reduce and scale each trend output price between minimum and maximum values by using

log-sigmoid function and scale the trend output between to 0 (Down) and +1 (Up). There are

two main reasons for scaling all network input data to between (0) and 1 (or sometimes -1

and 1).

The reason for pre-processing neural network input Trend data to values between 0 ; 1 is

due to the way sigmoid functions squash the input data. The hyperbolic tangent transfer or

224

log-sigmoid transfer function used squashes all neuron activation levels to values between -1

and +1 or 0 and 1. However, if large positive or negative inputs are utilized without preprocessing,

these

values

end

up saturating

the function

(i.e.

becoming

so close

to the limit

of

the

function

that due

to rounding,

they

effectively

reach

the

function

limit).

For

example,

when

utilizing

a

hyperbolic

tan function

once

an

input

value

reaches

10;

its

hyperbolic

tangent

equals

0.999999995.

Therefore,

in

order

to

keep

input

values

within

a meaningful

range,

the data

must

be

pre-processed.

4. The fifth pre-processing algorithm:

Eliminate the first row from data file which contains header and not count it in the training

process.

5. The sixth pre-processing algorithm:

Check the file format if it was not in standard form which had (date, open, low, high, close,

volume), it will be refused.

The scaling algorithm used on all input series meant that the network output was also scaled.

Therefore, upon completion of the neural network calculations, a reverse scaling algorithm

was then run on the output data to effectively un-scale the output.

7.4. Frequency of data

There is constrained direction on the optimal frequency of used input data. Kaastra and Boyd

(1996) note that:

The data frequency relies on upon the objectives and targets of the researcher. The ideal offfloor

dealer

in the

stock or

commodity

and

item

futures

markets

would

no doubt

utilize

daily

data

if the

neural

network

as

a part

of a

general

exchanging

and

trading

system.

A financial

specialist

with a more

extended

term

horizon

may

utilize

monthly

or

weekly

data

to form

an

input

to

the

neural

system…

(p.220)

Hall

(1994)

reported

accomplishment

in

utilizing

weekly

data

to perform

an

input

to

the

neural

system

network

that

beat

benchmark

index.

Hall

(1994)

outlines

the issues

and

the

225

problems connected with deciding the correct predictive and prescient time of individual

neural systems network:

The developmental way of the stock and sharing selection issue or problem and the

probability of unmanageable external events cause the long-term forecasting inconsistent and

unreliable. The great size and enormous amount of noise that be found in this kind of

network additionally makes short time expectations temperamental and unreliable.

There is some future period for which a good forecasts and prediction can be performed.

Finding the best possible predictive period is a basic and crucial task according to modelling

developments, complex NLD which means non-direct dynamic systems for issues, for the

stock choosing and selecting problems, an estimate of the predictive and prescient period

must be made by experimentation over a time of many stages or cycles… For relations and

connections of this complexity, there is no genuine ideal. The main technique for finding the

best possible prescient and predictive period is by using a frequencies and repeating

simulation of the whole network system, including simulated exchanging, over a

comprehensive time periods. (p.60)

For the reasons belonged to this examination and research, the weekly data has been utilized

for inputs to anticipate the price of the stock at t + 4 (the stock’s price in four weeks’ time

periods).

Training, testing and validation sets

To actualize and illustrate the artificial neural system network, the data set should be

partitioned into three groups which are: group of training set, group of validation set and the

last group are testing set. A few authors in the field credit and satisfy the implications and

conflict between the terms validation and testing data sets (cf Kaastra and Boyd, 1996) and

Beale, Hagan and Demuth, 2011). To evacuate and remove all uncertainty, which is

performed in this research, each of the data set divisions is characterized reliably with (Beale

et al., 2011):

226

– the set of data utilized for calculating and upgrading the system weights

and the biases.

– the set of data utilized for checking the mistake and error amid training.

Normally amid training, the error on the acceptance and validation set of data declines until

over-fitting begins to happen and soon thereafter the error begins to increment once more.

System learning parameters are set to guarantee that over-fitting does not happen.

– the set of data that is out of the sample test utilized when system training is

finished to test how exact the system is, and in this manner, to look at system performance.

A good choosing and determination of the good training data window is vital and important

in light of the fact that, “If the base training window is too long the model will be moderate

and slow to react to state changes, but If the training window is too short, the model may

blow up attends to noise” (Hall, 1994, p.63).

While a static framework can be effectively demonstrated by utilizing a random data division

Tanique, time series expectation needs the utilization of moving window testing. Moving

window testing has been portrayed as takes after (Kaastra and Boyd, 1996):

Walk forward testing includes partitioning and dividing the data into a progression of

validation and acceptance sets. Every set is moved and advanced forward through the time

series.

Walk-forward testing endeavors to recreate and simulate real life exchanging and tests the

heartiness and robustness of the model through its continuous retraining on an expansive out

of sample data set. In walk forward testing, the amount and the size of acceptance and

validation set cause the retraining frequency of the neural system network. The frequent

retraining takes more time, whereas, it permits the system to adjust and adapt faster to

changing economic situations. (p.223)

227

To examine how the walk forward testing functions. The main window which is window#1

partitions the first piece of data into training, validation and testing. To examine the system

moreover into the time series, the system then makes another training, validation and testing

set. The walk forward methodology is preceded until the end of the craved testing series of

time.

Walk forward methodology was used for all testing. Determination of the right training

window length is basic as the training window goes about as the “memory” for the system

and the system recognize the connections between input variables and output according to

the patterns contained in the training window’s data (Deboeck, 1994). The literature gives

little direction either as far as hypothesis or heuristics for the right determination of training

periods. Hall (1994) demonstrates that:

“Commonly the training window stays fixed and stable as it slides and moves through time.

At the point when the training data window is long, the network is better ready to separate

amongst noise and the true system reaction. Consequently, the precision and accuracy of the

system frequently enhances as the training window is expanded and increased, on the

contrary, if the present condition and state of the network stays consistent and stable, the base

and minimum length of the historical training window decides how the model will react amid

these state changes. On the off chance that the base and minimum training window is too

long, the model will be moderate and slow to react to state changes. Since a great part of the

chance for beating the market sector which happens right on time in the state changes, the

viability and effectiveness of the total exchanging system will be low. But if the training

window is too short, the model may go overboard to noise. It might likewise go overboard

and overact to the fluctuation that typically happens amid the state changes. In actuality, the

model gets to be insecure and unstable amid these periods. A financial specialist who reacts

and response to an insecure and unstable model will be fluctuated, rapidly exchanging

forward and backward between two sorts of portfolios at precisely the wrong time. Once

more, the viability and efficiency of the total exchanging system will be low”. (P.62-63)

228

To decide the suitable length of the preparation window, a few systems networks were made

for every stock so that the accompanying training periods were tried and tested:

Neural network paradigm/structure:

The ANN structure adopted consisted of 3-layer network which comprises the input layer, a

hidden layer, and the output layer. This system is a feed-forward with propagation network.

A schematic diagram of the neural network adopted is shown in Figure (7.1) Schematic

diagram of artificial neural network implementation. The fully feed-forward structure was

determined based on the findings of two studies. The first was a comparative study by Hallas

and Dorffner (1998) that looked at the use of feed-forward networks and recurrent networks

for the prediction of 8 different types of non-linear time series data. This study compared the

performance of three difference types of feed-forward networks and seven types of recurrent

networks at predicting five different non-linear time series data. Hallas and Dorffner (1998)

concluded that the feed-forward networks generally performed better and that their study

pointed to “serious limitations of recurrent neural networks applied to nonlinear prediction

tasks” (p.647). The other study by Dematos, Boyd, Kermanshahi, Kohzadi and Kaastra

(1996) compared the results of using recurrent and feed-forward neural networks to predict

the Japanese yen/US dollar exchange rate. This study also concluded that despite the relative

simplicity of the feed-forward network structure, the feed-forward network outperformed the

recurrent network

(Dematos et al., 1996).

For each of the stocks and portfolios in the data set there are 9 inputs. 5 of the inputs are

fundamental inputs and 4 of the inputs are technical indicators. The network was configured

so that during the learning period, the network would try to predict the price of each stock in

231

four weeks’ time. For example, at the beginning of the learning period (t=0), all 9 inputs

(including the stock price) were read into the network. The network then simulated what the

stock price would be not less than 100 day which means 24 weeks’ time (t >= 24). The

network then compared the simulated price to the real price. Learning rate adjusted to

be=0.3, the resulting error must be so small less than delta and was then back propagated

through the network and the process was repeated using the next input vector.

7.7. Number of hidden layers

The use of hidden layers is what gives artificial neural networks the ability to generalize

(Kaastra & Boyd, 1996). From a theoretical perspective, a back-propagation network with a

sufficient number of hidden neurons is able to accurately model any continuous function. The

use of multiple hidden layers can improve network performance up to a point; however, the

use of too many hidden layers increases computation time and leads to over fitting (Kaastra

& Boyd, 1996). Kaastra and Boyd (1996) have explained how neural networks over fit as

follows:

“Overfitting occurs when a forecasting model has too few degrees of freedom.

Clearly, it’s been noticed that the model has few noticing in the relation between its

parameter, and it is also, has the ability to recognize the several points instead of

recognize and learn the common pattern. In the example of ANN, the number of

neuron weights which is connected to the neurons in the hidden layer and the size of

training set which is known as observation number, find the similarity of overfitting.

The greatest number of weights related and relative to the training set’s size, the

greatest ability and accuracy of the network to memorize and recognize the nature of

individual recognition and observation, this cause the generalization of validation set

is lost and disappear and this model is little used in real predicting” (p.225)

A network that suffers from over fitting is generally considered non-useful. (Hall, 1994):

“In general, for evolutionary, complex NLD non-linear dynamic systems, the

accuracy of a model in explaining some local condition is inversely proportional to its

usefulness in discovering and explaining future states.” (p.57)

In the absence of an established theoretical framework for selecting the number of hidden

layers, the heuristic of using a maximum of two hidden layers is generally considered

232

appropriate (Kaastra & Boyd, 1996). For the purposes of this research, a single hidden layer

has been utilized.

7.8. Number of hidden neurons

There is no established method for selecting the optimum number of neurons for the hidden

layer. Previous research in the area has relied on testing to determine the optimum number

for the particular network in question (Tilakaratne et al., 2008). They also, added (Yoon &

Swales, 1991), “the accuracy and performance became better and developed since the

number unit belonged to the hidden also increased up to a certain point…Increasing the

number of hidden units beyond this point produced no further improvement but impaired the

network’s performance” (P.491).

Some heuristics have been proposed by previous researchers (refer Table 7 Heuristics to

estimate hidden layer size).

Neural network paradigm/structure:

The ANN structure adopted consisted of 3-layer network which comprises the input layer, a

hidden layer, and the output layer. This system is a feed-forward with propagation network.

A schematic diagram of the neural network adopted is shown in Figure (7.1) Schematic

diagram of artificial neural network implementation. The fully feed-forward structure was

determined based on the findings of two studies. The first was a comparative study by Hallas

and Dorffner (1998) that looked at the use of feed-forward networks and recurrent networks

for the prediction of 8 different types of non-linear time series data. This study compared the

performance of three difference types of feed-forward networks and seven types of recurrent

networks at predicting five different non-linear time series data. Hallas and Dorffner (1998)

concluded that the feed-forward networks generally performed better and that their study

pointed to “serious limitations of recurrent neural networks applied to nonlinear prediction

tasks” (p.647). The other study by Dematos, Boyd, Kermanshahi, Kohzadi and Kaastra

(1996) compared the results of using recurrent and feed-forward neural networks to predict

the Japanese yen/US dollar exchange rate. This study also concluded that despite the relative

simplicity of the feed-forward network structure, the feed-forward network outperformed the

recurrent network

(Dematos et al., 1996).

For each of the stocks and portfolios in the data set there are 9 inputs. 5 of the inputs are

fundamental inputs and 4 of the inputs are technical indicators. The network was configured

so that during the learning period, the network would try to predict the price of each stock in

231

four weeks’ time. For example, at the beginning of the learning period (t=0), all 9 inputs

(including the stock price) were read into the network. The network then simulated what the

stock price would be not less than 100 day which means 24 weeks’ time (t >= 24). The

network then compared the simulated price to the real price. Learning rate adjusted to

be=0.3, the resulting error must be so small less than delta and was then back propagated

through the network and the process was repeated using the next input vector.

7.7. Number of hidden layers

The use of hidden layers is what gives artificial neural networks the ability to generalize

(Kaastra & Boyd, 1996). From a theoretical perspective, a back-propagation network with a

sufficient number of hidden neurons is able to accurately model any continuous function. The

use of multiple hidden layers can improve network performance up to a point; however, the

use of too many hidden layers increases computation time and leads to over fitting (Kaastra

& Boyd, 1996). Kaastra and Boyd (1996) have explained how neural networks over fit as

follows:

“Overfitting occurs when a forecasting model has too few degrees of freedom.

Clearly, it’s been noticed that the model has few noticing in the relation between its

parameter, and it is also, has the ability to recognize the several points instead of

recognize and learn the common pattern. In the example of ANN, the number of

neuron weights which is connected to the neurons in the hidden layer and the size of

training set which is known as observation number, find the similarity of overfitting.

The greatest number of weights related and relative to the training set’s size, the

greatest ability and accuracy of the network to memorize and recognize the nature of

individual recognition and observation, this cause the generalization of validation set

is lost and disappear and this model is little used in real predicting” (p.225)

A network that suffers from over fitting is generally considered non-useful. (Hall, 1994):

“In general, for evolutionary, complex NLD non-linear dynamic systems, the

accuracy of a model in explaining some local condition is inversely proportional to its

usefulness in discovering and explaining future states.” (p.57)

In the absence of an established theoretical framework for selecting the number of hidden

layers, the heuristic of using a maximum of two hidden layers is generally considered

232

appropriate (Kaastra & Boyd, 1996). For the purposes of this research, a single hidden layer

has been utilized.

7.8. Number of hidden neurons

There is no established method for selecting the optimum number of neurons for the hidden

layer. Previous research in the area has relied on testing to determine the optimum number

for the particular network in question (Tilakaratne et al., 2008). They also, added (Yoon &

Swales, 1991), “the accuracy and performance became better and developed since the

number unit belonged to the hidden also increased up to a certain point…Increasing the

number of hidden units beyond this point produced no further improvement but impaired the

network’s performance” (P.491).

Some heuristics have been proposed by previous researchers (refer Table 7 Heuristics to

estimate hidden layer size).