Print

Print


Hi Pete,

>  Jacek:
>
>    o What is the setup of the server side monitoring at SLAC? You now have
>      the final collector machine? Have you had a chance to look at the data?
>

Well, access pattern monitoring is not very lucky. We did turn 
monitoring on all 30 xrootd servers ~week ago,  but few days later due 
to power problems at SLAC the collector machine (objydev11) disappeared 
and is still down (among many other machines). I had a first peek at the 
data from the first few days though: collector's log files grew with the 
speed ~20GB/day (before the problems with cond14boot lock server 
started). The size of data in mySQL was ~50% larger: 1-day's worth of 
running = ~30GB. I added a lot of automation to the whole process of 
decoding data, and prepared some scripts to backup logs in HPSS. First 
peek at the data revealed a problem with time windows send from xrootd 
(in some cases begin time is after end time). I temporarily patched it - 
I am exchanging emails with Andy about that problem. I've had fun with 
improving code for detection of out of order packets - in some cases 
quite a few consecutive packets arrived out of order which was not 
handled correctly before. In the process of testing the code which was 
reordering out of order packets I managed to wipe out the log files from 
the first few days, before I pushed them to HPSS. The accident made me 
redo correction of out of order packets to make it safer - I am working 
on it right now.

I spoke with Randy about getting a dedicated machine with ~500GB local 
disk for collecting/massaging/mining access patterns data. I think he 
wants to get ok from BaBar first, no, I don't have it yet.

I expect we might be able to turn back monitoring early next week, but 
not earlier, due to SC.

Jacek