I have just had the ?interesting? experience that "ig=-1" does not 
always generate unique random seeds, and I thought I should share that 
experience.....
I have used "ig = -1" to randomize seeds for some time.  I used it 
without hesitation as I worked through the ASMD tutorial here:
http://ambermd.org/tutorials/advanced/tutorial26/
However, intriguingly, on our high performance cluster at Vanderbilt, 
when I submit 100 jobs-at-a-time (an ASMD "stage"), I am seeing 
duplicate ig values returned every few hundred runs.
This could be imaginably attributable to a combination of factors:
1) Just as in a room of 23 people, there is a 50% chance that 2 will 
share the same birthday... in a collection of 100 MD jobs, there is 
around an approx 1% chance that 2 will share the same microsecond start 
time (Apologies in advance if I butchered some math.)
2) A high performance cluster, launching multiple simultaneous jobs "at 
once", could imaginably turn a 1% chance into a 10% chance on highly 
synchronized nodes.
3) The resolution of the gettimeofday() function (called from 
pmemd_clib.c) could be significantly lower than one microsecond in 
practice (if google is to be believed)
https://www.google.com/#q=resolution+of+gettimeofday
It admittedly a nuisance issue.  The choices are:
a) Ignore the issue entirely.  Statistically, it's likely not too 
important if only one in every 200 md runs is a duplicate run.
b) Set random seeds with environment variables available in the high 
performance cluster or ASMD job launch ecosystem *i**nstead* of 
"trusting" ig=-1 (examles: task IDs, (ASMD stage*10000 + ASMD_run), 
etc)  (in which case we should update the ASMD tutorial - so that ig=-1 
at least has a caution around it)
c) Modify the pmemd code to enhance randomness of ig=1, by adding 
entropy.  (The current 0 to 999999 range is only using 20 bits of the 31 
that could be used).
In case "c" is interesting.... read on....
Below is a sketch of code that honors the current microsecond concept, 
but adds another 1000 possibilities based on the contents of 
/dev/urandom on a linux system.  Portability issues are rightly of great 
concern to the community.  You could activate code like this in response 
to a new "ig = -2" possibility, or in response to install-time's 
"./configure"'s reporting that /dev/urandom is available.  The code 
below does not require any new third party libraries (like a "better" 
entropy generation scheme, or a guid generator - might require)... and I 
think it will work on any linux system I am aware of from the last decade.
Again, this code below is not intended to be _the_ solution - just some 
food-for-thought if the team should consider enhancing randomness beyond 
the current 0-999999 limited sys clock.  You might want "ig = -3" (say) 
to init all 31 bits of the seed from /dev/urandom.........
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/time.h>
#include <assert.h>
main(int argc, const char** argv)
{
    struct timeval my_tv;
    int entropyRead;
    int entropyBits;
    int entropyFile = open("/dev/urandom",O_RDONLY);
    assert(entropyFile != -1);
    entropyRead = read(entropyFile,&entropyBits,sizeof(entropyBits));
    assert (entropyRead == sizeof(entropyBits));
    close(entropyFile);
    entropyBits &= 0x7fffff; // Mask off sign bit
    entropyBits %= 1000;
    // What you do today in pmemd_clib.c
    gettimeofday(&my_tv,NULL);
    printf("Today's random seed:  %09d\n",(int)my_tv.tv_usec);
    printf("Enhanced random seed: %09d\n",(int)my_tv.tv_usec + 
entropyBits * 1000000);
    return 0;
}
_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jun 20 2017 - 11:30:03 PDT