[Evogrid-dev] Simulation, Analysis and Scoring implemented

Peter Newman peter.newman at digitalspace.com
Fri Dec 18 03:48:14 UTC 2009


Well, this is a pretty noticeable milestone today. I can now run a 
simulator and an analysis daemon constantly (well, except for one or two 
lock-up bugs that I'll fix soon-ish), and produce scores that are useful.

Statistics are calculated for every frame, scores are calculated for the 
entire simulation. These are not combined into a singular score, since 
some scores are more important then others, and will need to be scaled, 
based on the desired target.

I currently have code that produces the following statistics:
* temperature - Average temperature across the entire simulation
* avg-molecule-size - The average number of bonds this frame.
* max-molecule-size - The largest molecule size, this frame. This is 
calculated as the longest bond path between any two atoms.

I also calculate localized temperatures in a 2D grid, but since the 
simulation is in 3D space, this is not particularly useful currently.

 From these statistics, we calculate the following scores:
* temperature-mean - The mean temperature across all the frames
* temperature-mean-change - The difference between the initial * 
temperature specified, and temperature-mean
* failed-frames - This is a negative score based on how many frames are 
all 0, indicating they were never simulated
* avg-avg-molecule-size - The average of avg-molecule-size across all frames
* avg-max-molecule-size - The average of max-module-size across all frames
* max-avg-molecule-size - The highest value recorded for avg-molecule-size
* max-max-molecule-size - The highest value recorded for max-molecule-size

Using these scores, I think we have enough to start searching for 
"complexity". Using failed-frames to avoid parameters that crash/lock 
the simulator, and the molecule-size scores to search for ever larger 
structures, we should be able to perform a hill-climb that produces 
results that are increasingly complex (by those definitions of complexity).

The temperature scores are currently not very useful, since we are using 
the 'temperature bath' feature of GROMACS to keep the temperature near 
the initial conditions to avoid run-away thermal increases, due to some 
issue with our generation, either bad data or exceedingly dense (high 
pressure) initial conditions. Perhaps we can use another search, for 
parameters that don't cause this.

I'm already clearly going to need better/more hardware for simulation 
purposes, analysis is much faster then simulation. Fortunately due to 
the batch pull design of the simulator, this hardware doesn't need to be 
anything very special, it doesn't need to be on a fast network as a 
computation grid would require. Of course, a computation grid could be 
performing batch jobs, if someone has one to spare...

Peter N


More information about the Evogrid-dev mailing list