Reproducible research

Reproducible research is quite important topic. Once you design, prepare, and run your experiment you should make sure it will be possible to reproduce it in the future. Ideally, anyone should be able to perform exactly the same type of experiment.

Arround 18 years ago, I started to develop: G(enetic) A(lgorithm) B(ack) P(ropagation). At that time, layout of Neural Network (layers, biases, and connections between neurons) was usualy taken as granted. To solve problem using NN you had to either use some structure described in some scientific paper or design it on your own. I have decided to test slightly different approach. I have decided to evolve Neural Networks.

Each Neural Network structure was evolving inside small, isolated, population maintained by Genetic Algorithm. After some period of time, best fitted individuals – ones that could solve the problem most efficiently – had a chance to migrate. This way, best structure for a given problem was growing slowly without any external intervention. Each, evolved, Neural Network was supposed to perform two tasks:

– learn to solve the problem – using input patterns from first set,
– solve the final problems – using input patterns from second.

In a sense, whole process was completely unsupervised. Neural Networks were completely random (at the beginning), and over time the optimal solution was emerging.

Recently, I have decided to check whether whole thing works or not. To my surprise, getting from the archive (where all sources and input files were stored) to running state was really simple task. Of course, it took some time to get familiar with documentation – yet again, to my surprise, it was quite good. It took some time to compile things (even though it worked almost out of the box), and it took some time to set initial parameters for the application. Anyway, what surprised me most was the cost of getting from zero to running application after more than seventeen years! I was able to reuse sample data, I was able to run experiments, and it simply worked as expected! The only difference I have noticed was the time needed to evolve optimal solution – algorithm performed way faster.

That’s what I call research reproducability. When somebody asks me:

“-Can I easily reproduce your experiment?”, I can give firm and confident answer,
“-Yes you can!”.

All I did was keeping close to standards and well established practices.

A few well-chosen test cases and a few print statements in the code may be enough.

Some programs are not handled well by debuggers: multi-process or multi-thread programs, operating systems, and distributed systems must often be debugged by lower-level approaches. In such situations, you’re on your own, without much help besides print statements and your own experience and ability to reason about code.

— The Practice of Programming – Brian W. Kernighan and Rob Pike

Make sure to look here if you are using R for your research: Reproducible Research. You can read a little bit about role of Software Engineers in research: here.