Print

Print


Today Bill and I attempted to start 1092 xrootd data servers and see if
they could self-cluster and deliver data. I would call the test a success
eventhough only 900 nodes were able to successfully come up. Which is
excellent for this test.

I traced the failures of 192 nodes to testing turds left behind within
the last year. Specifically, clusters were started by various people
on certain bronco's and tori's which caused files to be created with the
wrong ownership; preventing the olbd from comming up as the files were
either inaccessible or not overwritable due to permission problems. I will
be looking to see how we can avoid that problem in the next release.

Once we overcame some lsf scalability issues (we used lsrun to start the
servers); the test went smoothly and 900 nodes were able to reach a stable
configuration in about 56 seconds.

I will be try to cluster all 1092 nodes next week.

Andy