I have just had the ?interesting? experience that "ig=-1" does not
always generate unique random seeds, and I thought I should share that
experience.....
I have used "ig = -1" to randomize seeds for some time. I used it
without hesitation as I worked through the ASMD tutorial here:
http://ambermd.org/tutorials/advanced/tutorial26/
However, intriguingly, on our high performance cluster at Vanderbilt,
when I submit 100 jobs-at-a-time (an ASMD "stage"), I am seeing
duplicate ig values returned every few hundred runs.
This could be imaginably attributable to a combination of factors:
1) Just as in a room of 23 people, there is a 50% chance that 2 will
share the same birthday... in a collection of 100 MD jobs, there is
around an approx 1% chance that 2 will share the same microsecond start
time (Apologies in advance if I butchered some math.)
2) A high performance cluster, launching multiple simultaneous jobs "at
once", could imaginably turn a 1% chance into a 10% chance on highly
synchronized nodes.
3) The resolution of the gettimeofday() function (called from
pmemd_clib.c) could be significantly lower than one microsecond in
practice (if google is to be believed)
https://www.google.com/#q=resolution+of+gettimeofday
It admittedly a nuisance issue. The choices are:
a) Ignore the issue entirely. Statistically, it's likely not too
important if only one in every 200 md runs is a duplicate run.
b) Set random seeds with environment variables available in the high
performance cluster or ASMD job launch ecosystem *i**nstead* of
"trusting" ig=-1 (examles: task IDs, (ASMD stage*10000 + ASMD_run),
etc) (in which case we should update the ASMD tutorial - so that ig=-1
at least has a caution around it)
c) Modify the pmemd code to enhance randomness of ig=1, by adding
entropy. (The current 0 to 999999 range is only using 20 bits of the 31
that could be used).
In case "c" is interesting.... read on....
Below is a sketch of code that honors the current microsecond concept,
but adds another 1000 possibilities based on the contents of
/dev/urandom on a linux system. Portability issues are rightly of great
concern to the community. You could activate code like this in response
to a new "ig = -2" possibility, or in response to install-time's
"./configure"'s reporting that /dev/urandom is available. The code
below does not require any new third party libraries (like a "better"
entropy generation scheme, or a guid generator - might require)... and I
think it will work on any linux system I am aware of from the last decade.
Again, this code below is not intended to be _the_ solution - just some
food-for-thought if the team should consider enhancing randomness beyond
the current 0-999999 limited sys clock. You might want "ig = -3" (say)
to init all 31 bits of the seed from /dev/urandom.........
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/time.h>
#include <assert.h>
main(int argc, const char** argv)
{
struct timeval my_tv;
int entropyRead;
int entropyBits;
int entropyFile = open("/dev/urandom",O_RDONLY);
assert(entropyFile != -1);
entropyRead = read(entropyFile,&entropyBits,sizeof(entropyBits));
assert (entropyRead == sizeof(entropyBits));
close(entropyFile);
entropyBits &= 0x7fffff; // Mask off sign bit
entropyBits %= 1000;
// What you do today in pmemd_clib.c
gettimeofday(&my_tv,NULL);
printf("Today's random seed: %09d\n",(int)my_tv.tv_usec);
printf("Enhanced random seed: %09d\n",(int)my_tv.tv_usec +
entropyBits * 1000000);
return 0;
}
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jun 20 2017 - 11:30:03 PDT