[Evogrid-dev] Simulation, Analysis and Scoring implemented
Peter Newman
peter.newman at digitalspace.com
Fri Dec 18 03:48:14 UTC 2009
Well, this is a pretty noticeable milestone today. I can now run a
simulator and an analysis daemon constantly (well, except for one or two
lock-up bugs that I'll fix soon-ish), and produce scores that are useful.
Statistics are calculated for every frame, scores are calculated for the
entire simulation. These are not combined into a singular score, since
some scores are more important then others, and will need to be scaled,
based on the desired target.
I currently have code that produces the following statistics:
* temperature - Average temperature across the entire simulation
* avg-molecule-size - The average number of bonds this frame.
* max-molecule-size - The largest molecule size, this frame. This is
calculated as the longest bond path between any two atoms.
I also calculate localized temperatures in a 2D grid, but since the
simulation is in 3D space, this is not particularly useful currently.
From these statistics, we calculate the following scores:
* temperature-mean - The mean temperature across all the frames
* temperature-mean-change - The difference between the initial *
temperature specified, and temperature-mean
* failed-frames - This is a negative score based on how many frames are
all 0, indicating they were never simulated
* avg-avg-molecule-size - The average of avg-molecule-size across all frames
* avg-max-molecule-size - The average of max-module-size across all frames
* max-avg-molecule-size - The highest value recorded for avg-molecule-size
* max-max-molecule-size - The highest value recorded for max-molecule-size
Using these scores, I think we have enough to start searching for
"complexity". Using failed-frames to avoid parameters that crash/lock
the simulator, and the molecule-size scores to search for ever larger
structures, we should be able to perform a hill-climb that produces
results that are increasingly complex (by those definitions of complexity).
The temperature scores are currently not very useful, since we are using
the 'temperature bath' feature of GROMACS to keep the temperature near
the initial conditions to avoid run-away thermal increases, due to some
issue with our generation, either bad data or exceedingly dense (high
pressure) initial conditions. Perhaps we can use another search, for
parameters that don't cause this.
I'm already clearly going to need better/more hardware for simulation
purposes, analysis is much faster then simulation. Fortunately due to
the batch pull design of the simulator, this hardware doesn't need to be
anything very special, it doesn't need to be on a fast network as a
computation grid would require. Of course, a computation grid could be
performing batch jobs, if someone has one to spare...
Peter N
More information about the Evogrid-dev
mailing list