[AMBER] Duplicate random seeds with ig=-1 Code enhancement to think about, perhaps

From: Chris Moth <cmoth08.gmail.com>
Date: Tue, 20 Jun 2017 13:29:41 -0500

I have just had the ?interesting? experience that "ig=-1" does not
always generate unique random seeds, and I thought I should share that
experience.....

I have used "ig = -1" to randomize seeds for some time. I used it
without hesitation as I worked through the ASMD tutorial here:

http://ambermd.org/tutorials/advanced/tutorial26/

However, intriguingly, on our high performance cluster at Vanderbilt,
when I submit 100 jobs-at-a-time (an ASMD "stage"), I am seeing
duplicate ig values returned every few hundred runs.

This could be imaginably attributable to a combination of factors:

1) Just as in a room of 23 people, there is a 50% chance that 2 will
share the same birthday... in a collection of 100 MD jobs, there is
around an approx 1% chance that 2 will share the same microsecond start
time (Apologies in advance if I butchered some math.)
2) A high performance cluster, launching multiple simultaneous jobs "at
once", could imaginably turn a 1% chance into a 10% chance on highly
synchronized nodes.
3) The resolution of the gettimeofday() function (called from
pmemd_clib.c) could be significantly lower than one microsecond in
practice (if google is to be believed)

https://www.google.com/#q=resolution+of+gettimeofday

It admittedly a nuisance issue. The choices are:

a) Ignore the issue entirely. Statistically, it's likely not too
important if only one in every 200 md runs is a duplicate run.

b) Set random seeds with environment variables available in the high
performance cluster or ASMD job launch ecosystem *i**nstead* of
"trusting" ig=-1 (examles: task IDs, (ASMD stage*10000 + ASMD_run),
etc) (in which case we should update the ASMD tutorial - so that ig=-1
at least has a caution around it)

c) Modify the pmemd code to enhance randomness of ig=1, by adding
entropy. (The current 0 to 999999 range is only using 20 bits of the 31
that could be used).

In case "c" is interesting.... read on....

Below is a sketch of code that honors the current microsecond concept,
but adds another 1000 possibilities based on the contents of
/dev/urandom on a linux system. Portability issues are rightly of great
concern to the community. You could activate code like this in response
to a new "ig = -2" possibility, or in response to install-time's
"./configure"'s reporting that /dev/urandom is available. The code
below does not require any new third party libraries (like a "better"
entropy generation scheme, or a guid generator - might require)... and I
think it will work on any linux system I am aware of from the last decade.

Again, this code below is not intended to be _the_ solution - just some
food-for-thought if the team should consider enhancing randomness beyond
the current 0-999999 limited sys clock. You might want "ig = -3" (say)
to init all 31 bits of the seed from /dev/urandom.........

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/time.h>
#include <assert.h>

main(int argc, const char** argv)
{
    struct timeval my_tv;
    int entropyRead;
    int entropyBits;

    int entropyFile = open("/dev/urandom",O_RDONLY);
    assert(entropyFile != -1);

    entropyRead = read(entropyFile,&entropyBits,sizeof(entropyBits));
    assert (entropyRead == sizeof(entropyBits));
    close(entropyFile);
    entropyBits &= 0x7fffff; // Mask off sign bit
    entropyBits %= 1000;

    // What you do today in pmemd_clib.c
    gettimeofday(&my_tv,NULL);

    printf("Today's random seed: %09d\n",(int)my_tv.tv_usec);
    printf("Enhanced random seed: %09d\n",(int)my_tv.tv_usec +
entropyBits * 1000000);
    return 0;

}


_______________________________________________
AMBER mailing list
AMBER.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber
Received on Tue Jun 20 2017 - 11:30:03 PDT
Custom Search