Issues with implementing the neural net

Algorithms

Refactored code

So far I’ve been (Naively) making a new symbol for every node in the neural net. Then, whenever I have to change the number of input or hidden nodes, I have to do it by hand. This is time-consuming, and I don’t like it. So I spent about 2 hours trying to figure out how to make new symbols automatically in Common Lisp when it hit me; I could just use arrays! I have now refactored my NN code so that I have a vector for the input- & hidden-nodes and the weights from the hidden nodes to the single output-node, and I have a 2-dimensional array for the weights from the input- to hidden-nodes. I’ve also set up the code to make and populate the nodes with respective random-values (For the weights) and functions (For the nodes) in such a way that the number of entries all depend on the size of the arrays. And the size of the arrays all depend on 2 parameters defined by the global variable “numberofinputnodes” & “numberofhiddennodes”. With this change, I could change from a (3-5-1) NN to a (3-20-1) NN by changing “numberofhiddennodes”. Simple, easy, automated, and straightforward. I like this.

One note is that the NN is always going to be a 1-output continuous-output 3-layer fully-connected backpropagation network. This underlying structure of the architecture doesn’t change, though the number of nodes in the hidden layer changes.

Anonymous functions (Lambda-functions), closures, and macros

Earlier, I would hand-code every input- and hidden-node function whenever I would change the number of nodes. Since I want to automate this, hand-coding cannot be allowed to continue. Seeing as it is considered proper style to use macros only when a function cannot do the same job easily, I began making code that would populate each element of the node-arrays with functions that calculate and spit out what the output of the node should be.

In the process of doing that, I found that I could not just use a lambda-function, it would have to be a lambda-function with a closure. The other alternative would have been to use a macro, and I didn’t feel like doing that. So I used the great Lispdoc website to guide me to an explanation of how closures work and how to use them in this context. Finding what I needed was quick, and I solved this problem. I wish there was a way to turn off the requirement to use a closure, but really, using it in this context doesn’t have any discernible performance loss to the program. The code is below.


;; Define input-nodes
(defparameter input-node (make-array numberofinputnodes))

(defun define-input-nodes (list-of-set-indexes)
(loop for j below (length input-node)
do
(let ((i (elt list-of-set-indexes j)))
(setf (aref input-node j)
(lambda (dataset input-index)
(aref dataset input-index i))))))

(define-input-nodes '(0 1 3))

;; Define hidden nodes
(defparameter hidden-node (make-array numberofhiddennodes))

(defun define-hidden-nodes ()
(loop for i below (length hidden-node)
do
(let ((i i)) ;To get past binding the closure over the iterating variable.
(setf (aref hidden-node i)
(lambda (dataset input-index)
(tanh (+ (* (aref weights-1 0 i)
(funcall (aref input-node 0) dataset input-index))
(* (aref weights-1 1 i)
(funcall (aref input-node 1) dataset input-index))
(* (aref weights-1 2 i)
(funcall (aref input-node 2) dataset input-index)))))))))

(define-hidden-nodes)

As you can see, the code for the hidden-nodes is still hard-coded for 3 input nodes. This can be changed with a macro, as I have done with the output-node code. In fact, I could just change that macro to be able to work for both layers. Good stuff. Just writing all this down to make an update helps with thinking through the improvements I could make. Anyway, here is the code for the output node, with the helper macro. I think there could be some way to make it possible for this to be done with a function, but it escapes me right now, the macro works, and I just want to keep on adding features to the code. I can always refactor later on.


;; Macro to populate the output-node for all existing hidden-nodes
(defmacro output-node ()
(let ((a ()))
(loop for i below numberofhiddennodes do
(push `(* (aref weights-2 ,i)
(funcall (aref hidden-node ,i) dataset input-index))
a))
`(tanh (+ ,@a))))

;; Output node
(defun node-output (dataset input-index)
(output-node))

This concludes the algorithms I’ve been working on so far. I’ve been refactoring my code to make it easier to change the NN, and I’ve learned some things along the way. Has this actually helped me? I am sad to say that the answer is “no”. I have been having issues with the performance of the NN on the training-set itself. After some time, they don’t continue to reduce the error. In fact, the error increases! If that is with validation & test-sets, thats fine, but its not fine if this is on the training-set itself. I thought that I am not placing enough learning-capacity (Hidden nodes) in the NN, but that is not the case. The NN with 20 hidden-nodes actually performs worse than the one with 5. This leads me to the next section

Data

Output

A NN works by learning from the data. Therefore, it is susceptible to the issue of “garbage in, garbage out”. The output that I get from the NN has the same distribution as the example outputs, but at a much smaller range. This is probably because it ignores the outliers, which is acceptable when I use a satisfactory stop-loss and position-sizing strategy. The problem I have is that the distributions themselves aren’t even centred at the same values. That makes it a bit iffy. I would like to improve upon this.

In all the articles and books I’ve read so far about this problem domain, the researchers find that they are able to make a pretty good NN with few training-examples (Called exemplars). One, at least, has claimed to do this successfully for the S&P500 daily price with only 12 exemplars! Thats pretty good. Thats amazing. I’m using 5000-10000 exemplars and I still don’t quite get it. The other difference is that the other researchers use a lot more input-data than I do. I have only 3 right now, the ADX(14), the ATR(20), and the SMA(20) minus the present close. In the David Skapura book, the researchers use the Stochastic(9) indicators, the ADX(18), the MACD(12,26), the price right now and the change from 5 days prior to predict the price 5 periods (days) into the future. That is six inputs to predict one output. I’m currently using 3 inputs to predict one output.

I would like to add more inputs, and the obvious decision would be to add the ones that the researchers have shown to work. I should just switch to the S&P500 daily data, and replicate the results of these researchers. But I’m too thick-headed for that. I don’t want to simply copy other people’s work. I want to learn. I want to know why they selected those input-vectors. I want to do this because in the future I don’t want to be limited to copying. I want to be able to objectively say that I chose a particular input-vector because it is useful (And mathematically provable to be so), not because “Some other people used it before”.

That leads me to the problem of selecting which input-vectors to use. There are, after all, a whole slew of technical indicators out there. Its no use selecting a bunch that are correlated to each other, since that doesn’t give the NN additional information to work with. Its like giving the information of height in both inches and centimetres; its redundant and a waste of time and processing power. The NN should be given useful information. Information that gives new data, not just rehashes it. In other words, the new input-vector should be “orthogonal” to the current ones. I have read through my research that this can be done by something called “Principle Components Analysis”. I feel that learning how to do this would be a valuable use of my time while I’m waiting for another NN book to arrive.

The other NN book

A few weeks ago I was browsing online at the Chapters and Amazon websites to see if there was anything published on NNs after the 90s. To my pleasant surprise, there were. And to my absolute glee, I found a book titled “Foreign-exchange-rate forecasting with Artificial Neural Networks”. I was in the public library at the time, and I’m very proud to have restrained myself from doing cartwheels. That book’s title says exactly what I want to do, and it was published in August 2007. That book practically has my name written on it. I’ve gone through the Google preview of it, and I’m excited to read it. One of the sections I’m most looking forward to is the section on data processing.

On a sober note. This book was published when I was just entering my second year at queen’s university at kingston. I could have been doing this while I was there instead of right now. If I was where I am now w.r.t. programming, 3.5 years ago, I’d probably be in the Bahamas right now. Heck, I’d probably have done some shitty work after university (After! I wouldn’t even have bothered with finishing university!), saved some money, and I’d be good after that. The decision to go to university was foolish, and I remained an idiot while I was there.

We make do with what we have right now. And right now I’ll figure out how to do principle component analysis (PCA). Time to get my math on. I should look up the library too when doing this since I doubt there is enough in-depth information on the internet. Perhaps I’ll code the whole thing in Common Lisp, but I might not if the R statistical system has the ability to do this. I hope if that is the case, that its not too difficult to learn R.

Leave a Reply