Hi Pete,
> Jacek:
>
> o What is the setup of the server side monitoring at SLAC? You now have
> the final collector machine? Have you had a chance to look at the data?
>
Well, access pattern monitoring is not very lucky. We did turn
monitoring on all 30 xrootd servers ~week ago, but few days later due
to power problems at SLAC the collector machine (objydev11) disappeared
and is still down (among many other machines). I had a first peek at the
data from the first few days though: collector's log files grew with the
speed ~20GB/day (before the problems with cond14boot lock server
started). The size of data in mySQL was ~50% larger: 1-day's worth of
running = ~30GB. I added a lot of automation to the whole process of
decoding data, and prepared some scripts to backup logs in HPSS. First
peek at the data revealed a problem with time windows send from xrootd
(in some cases begin time is after end time). I temporarily patched it -
I am exchanging emails with Andy about that problem. I've had fun with
improving code for detection of out of order packets - in some cases
quite a few consecutive packets arrived out of order which was not
handled correctly before. In the process of testing the code which was
reordering out of order packets I managed to wipe out the log files from
the first few days, before I pushed them to HPSS. The accident made me
redo correction of out of order packets to make it safer - I am working
on it right now.
I spoke with Randy about getting a dedicated machine with ~500GB local
disk for collecting/massaging/mining access patterns data. I think he
wants to get ok from BaBar first, no, I don't have it yet.
I expect we might be able to turn back monitoring early next week, but
not earlier, due to SC.
Jacek
|