Print

Print


URL:
  <http://savannah.cern.ch/bugs/?80880>

                 Summary: Default delay servers value is incorrect for
supervisors
                 Project: XROOTD
            Submitted by: None
            Submitted on: 2011-04-12 16:09
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
        Originator Email: [log in to unmask]
             Open/Closed: Open
         Discussion Lock: Any
      Fixed by commit(s): 

    _______________________________________________________

Details:

In our STAR deployment, we found an oddity where the clients would eventually
be delayed ad-aeternam by Scalla. After some tracing (with Andy's help), it
was found that the supervisors would drop dataservers unexpectedly.

Supervisor A expecting 48 nodes in B64 / 75% would be finding less
dataservers due to the fact that some may have rolled over
another supervisor B ... and thinking something is wrong. Re-starting
periodically a few dataservers would exacerbate this issue + the fact that we
naturally have more supervisor was bound to have reach such situation
(re-started dataaservers would roll over the additional supervisor B present
for resilience and redundancy and supervisor A would "miss" its quorum). 

Way around was to set

if named xrdtestsuper
cms.delay servers 1
else
cms.delay servers 75% service 15 startup 65 suspend 15
fi

forcing the default delay for supervisor.

Andy requested filing a bug report so this miss-feature would be fixed in
later revision of Xrootd.

Done.

Jerome









    _______________________________________________________

Reply to this item at:

  <http://savannah.cern.ch/bugs/?80880>

_______________________________________________
  Message sent via/by LCG Savannah
  http://savannah.cern.ch/