Project Theodore: Day 5

Working with Ecomagination has shown us that it takes a lot of hard work to rally support and demonstrate to people the advantages and feasibility of our project. It has been great to receive the large amount of positive feedback from those of you who have commented on our project and from those people we have spoken to personally. We greatly appreciate the continued support.

Certain users have asked about the feasibility of this project as well as specifics about the algorithms we intend on implementing. Hopefully in this post we can make the technical aspects of our idea more clear. The subject of study that deals with the questions we are hoping to address is known as Machine Learning (see additional info). The subject has been formalized rigorously in mathematics and theoretical computer science over the past 50 years. It is not limited to exist in theoretical vacuum however. In the past 10-15 years, as machines have gained enormous computing power, machine learning algorithms have been applied to fields as varied as stock trading, interpreting human handwriting, audio perception, gene sequencing and analysis, search engine optimization, and many other areas.

We are essentially trying to develop a function which, using observed data in various features and grid usage, can learn how future consumption demand will behave. For that reason, the algorithms in machine learning which we are focused on are pattern classifiers. Genetic Programming and Neural Networks, Support Vector Machines, and Bayesian Learning are all algorithms which generate classification functions. They take as input a set of training samples, which are selected samples exhibiting the features we are interested in targeting, and converge on a statistically optimal classifier (by optimal we mean, one that minimizes error for the training samples). Using this classifier we can test its performance on new test data, and subsequently continue to adapt it in order to make it better by reconfiguring our classification algorithms. Although the flavor of the theory is different in each of these approaches to the problem, they all essentially highlight important features in the dataset, such as traffic or interesting weather patterns.

We believe that this goal is very feasible, because of the great amount of success that statistical learning algorithms have exhibited in past academic literature as well as in real world application. Our aim at this point is to make the general algorithms problem specific to grid consumption and benchmark the classifiers we generate. In order to do this we must compile enormous datasets for all the features we aim to address, preprocess them as input for our programs, and refine our models to achieve higher prediction efficiency.

Although this is a vast simplification of the work in machine learning theory, hopefully you have a better understanding of the inner workings of Project Theodore. We will post again soon about more of the technical aspects of the project as well as our what we are learning by participating in Ecomagination.

Here are some technologies we are working with:
http://en.wikipedia.org/wiki/Machine_learning
http://en.wikipedia.org/wiki/Genetic_programming
http://en.wikipedia.org/wiki/Support_vector_machine http://en.wikipedia.org/wiki/Bayesian_network
http://www.nvidia.com/object/what_is_cuda_new.html
http://www.python.org/
http://en.wikipedia.org/wiki/C%2B%2B

Project Theodore

Friday, July 30, 2010

Day 5

No comments:

Post a Comment