ellyn is fast because it uses a c++ library to do most of the computation. However, once you have it installed, you can use it just like you would any other scikit-learn estimator, which makes it easy to do cross validation, ensemble learning, or to build any other kind of ML pipeline design. Follow the installation guide to get it up and running.
These instructions are written for an anaconda3 default python installation, but you can easily modify the paths to point to your installation.
The hairiest part of the installation is getting boost installed with boost python. If you don’t have boost yet, run
wget "https://sourceforge.net/projects/boost/files/boost/1.62.0/boost_1_62_0.tar.gz" tar -xzf boost_1_62_0.tar.gz # install boost # navigate to the installation folder cd boost_1_62_0 # bootstrap boost python builder ./boostrap.sh --with-libraries=python --with-python-root=/home/$USER/anaconda3 # add symbolic link to python3.5 include file ln -s /home/$USER/anaconda/include/python3.5m /home/$USER/anaconda/include/python3.5 # build boost python ./b2 --with-python
eigen is a sweet matrix library for c++. If you have deb / ubuntu, you can install it via
sudo apt-get install libeigen3-dev
otherwise, install it via their website. on linux systems it should be in
/usr/include/eigen3, but if it’s somewhere else, edit ellen/Makefile to point to it.
Now you can build the c++ library ellen. Go to this repo in terminal. Then type
cd ellyn/ellen make
In a python script, import ellyn:
from ellyn import ellyn
ellyn uses the same nomenclature as sklearn supervised learning modules. You can initialize a few learner in python as:
learner = ellyn()
or specify the generations, population size and selection algorithm as:
learner = ellyn(g = 100, popsize = 25, selection = 'lexicase')
Given a set of data with variables X and target Y, fit ellyn using the
You have now learned a model for your data. Predict your model’s response on a new set of variables as
y_pred = learner.predict(X_unseen)
Call ellyn from the terminal as
python -m ellyn.ellyn data_file_name -g 100 -p 50 -sel lexicase
python -m ellyn.ellyn --help to see options.
ellyn uses a stack-based, syntax-free, linear genome for constructing candidate equations.
ellyn has been used in several publications. Cite the one that best represents your use case, or you can cite my dissertation if you’re not sure.
La Cava, William G., “Automatic Development and Adaptation of Concise Nonlinear Models for System Identification” (2016). Doctoral Dissertations May 2014 - current. 731. link
La Cava, W., Danai, K., Spector, L., (2016). “Inference of Compact Nonlinear Dynamic Models by Epigenetic Local Search.” Engineering Applications of Artificial Intelligence. doi:10.1016/j.engappai.2016.07.004
La Cava, W., Spector, L., Danai, K. (2016). “epsilon-Lexicase selection for regression.” Proceedings of the Genetic and Evolutionary Computation Conference (GECCO). ACM, Denver, CO. preprint