LISTSERV 16.5 - XROOTD-L Archives

Subscriber's Corner
Email Lists
XROOTD-L Archives

XROOTD-L@LISTSERV.SLAC.STANFORD.EDU

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		XROOTD-L Home
		XROOTD-L September 2004
Subject:
Re: xrootd/data management use cases from last year's Lyon workshop
From:
Peter Elmer <[log in to unmask]>
Date:
10 Sep 2004 13:59:41 +0200Fri, 10 Sep 2004 13:59:41 +0200
Content-Type:
text/plain
Parts/Attachments:
text/plain (368 lines)
  Hi Artem,

On Thu, Sep 09, 2004 at 05:48:47PM -0700, Artem Trunov wrote:
> > > Admin should be able to turn on debugging remotely, i.e via a call to the
> > >   server.
> >
> >   Since the server can be restarted easily without crashing the clients,
> > the server could also be restarted with extra options in the config file.
> > I don't feel strongly that this has to be possible via the adminstrative
> > interface and don't even know how easy it would be to add this. Andy?
> 
> Config files are tailor controled, which means you have to run tailor or
> wait for tailor scheduled run to update the config, and then restart.
> Seems like a hassle otherwise drop this use case.

  [Note for non-slac people: tailor/taylor is a slac-custom system for 
distributing config files and software to machines they maintain.]

  I don't feel strongly about it. Andy should comment, though. I can perhaps
see that being able to turn on debugging in-situ without restarting the
server might have some value in some particular (theoretical) situation where 
the server has begun to behave strangely.

> > > Admin should be able to read server's log file(s) remotely, via a call. Log
> > >   files include main log file, error messages, trace output, pid file.
> >
> >   IIRC, in the original discussion we had about this some of use felt that
> > this would be useful, but overkill, since other tools could be used.
> 
> I think overkill is to use 'ssh' and 'more' to examine up to 48 logs files
> on 40 hosts. (48 is xrootd, olb, mlog, plog, slog, Slog x 8).

  [Note for non-slac people: the mlog/plog/slog/Slog files are all (I think)
associated with the staging system at SLAC and not the xrootd/olbd itself.]

  A couple of things to note:

   o the xrootd system will likely _never_ have access to the staging log
     files (mlog/plog/slog/Slog) so you'll probably have to face this 
     problem in any case. 

   o The adminstrative interface will connect to the xrootd (and not the
     olbd). I don't know if the olbd protocol would support providing _its_
     log file to xrootd, etc. etc.

  The whole thing starts to get rather ugly. In the end you will always have
some other log file (Ganglia?) that you want to look at which is on the
data server itself. This is a general problem so there must be some tool out 
there to harvest or display log files like this. If not presumably one can 
probably make something simple as you have undoubtedly already done.

  (i.e. you can try to convince Andy, but I won't help you do it for this 
   one...)

> > > Server should be able to log it's host's and it's process' cpu and memory
> > >   utilization and other usufull paramemters.
> >
> >   At the time of the workshop SLAC had no useful system monitoring (despite
> > having tens of millions of dollars of equipment). Since that time Yemi has
> > deployed the Ganglia monitoring at SLAC. Using some external agent like
> > Ganglia to monitor things like cpu and memory usage seems a better structure
> > (and is what we have at SLAC and other places). Artem, are you happy with
> > that?  (i.e. with no features in xrootd itself for this)
> 
> We don't have programmatic access to monitoring info, and therefore can
> not manipulate all figures at will. also, other users of xrootd don't use
> Ganglia.

  Other users will then presumably use some other system for monitoring 
and/or alarms. I don't see any reason to bloat xrootd by making it a 
generalized monitoring system. There are much better external tools for
that. xrootd has to provide access to statistics/metrics _specific_ to
xrootd and it in fact does that. Some external system (like Ganglia) should
gather those, display them, archive them so time series can be examined, etc.
At SLAC Yemi is in fact doing this.

  The issue of programmatic access to Ganglia information is not related
to xrootd. We can continue that discussion, but not here...

> > > Admin should be able to dump/load server's configuration remotely.
> > > ===>24/7 availability is essential. Stoping 1000+ clients for some simple
> > >   tasks like reconfiguration is bad bad bad.
> >
> >   Well, I think we've done pretty well at SLAC in terms of 24/7 availability.
> > (CNAF in particular seems to have problems, though.)

  A side note on my comment (above) about CNAF: when I was writing up my
original posting CNAF started (again) to have problems. The issue was one
we have seen before: The CNAF BaBar xrootd/olbd system is a pure disk cache
and one of the data servers had gone down. While the xrootd/olbd and client
support handling this gracefully, CNAF has no back end mass storage or other 
means for replicating (or "reobtaining") files when disk server goes down. 
Thus any problems with a disk server result immediately in user complaints 
about missing files. 

  There are several solutions here:

   o Doubling the size of the disk cache and putting more than one copy of 
     every file on different servers in the disk cache (ok, I'm joking).

   o Using the mass storage at CNAF (apparently problematic and not currently
     foreseen)

   o Allowing the files to be read or "reobtained" in its entirety from some 
     other Tier A center (e.g. via the proxy clients that Andy has described 
     several times)

  It sounds like we may have the pieces necessary to try the last possibility
at some point in the non-too-distant future.

> >   I'm not so clear on why it is useful to dump the configuration remotely.
> > Artem, do you still feel strongly about this?
> 
> The ultimate goal of remote administaration is not to log into any of kan
> servers at all. If you need to log in to check what the config file is,
> this complicates your life.

  I guess I don't feel strongly about this one and, on reflection, can see
how it could be useful. Simply looking at the config file on disk won't
always tell you with what configuration the server was started as it may
have been overwritten with a newer version of the config file since the
server was stated. You _might_ be able to backtrack through the log files
to when the server was started to look at the printout, but if it has
been running for many days that might not be so easy. (And in fact the
log files could even have been purged.) Andy should comment.

> > > Admin should be able to give a signal to dlb to rescan file system for
> > >   new/gone files.
> >
> >   The olb (once known as the "dlb") doesn't maintain any state on the data
> > servers, or have I misunderstood? I'm not sure what this means. The manager
> > olbd obviously does have a cache, but as I understand it it also times
> > out entries older than 8 hours.
> 
> So the proposal is to make it admin-induced in addition.

  For new files, it should be sufficient to simply send a prepare with the
list of files (a "prepare" with the list rather than simply opening them
serially to avoid paying the 10s "wait" between each one). Thus this is
presumably doable. If the admin has in fact put the files on disk for some 
reason and expects that they will be accessed in the next 8 hours, it could
be worthwhile. Otherwise the users will just take care of this naturally
as they begin to access the files.

  For files which have been removed, it will presumably depend on why
they have been removed. When the client runs into the situation where
it: 

    o tries to open a file X via the redirector xrootd
    o is redirected to server Y because the manager olbd has 
      "file X -> server Y" in its cache
    o is told by server Y that it doesn't in fact have file X

it is supposed to go back to the redirector xrootd and open the file at
the redirector again, but this time with kXR_refresh set. In this case
the result will be:

   o the file is found on some other server and the client is redirected there
   o the client is redirected at some server which will stage it in again

  What exactly would the admin be trying to achieve after having removed
a file? If it is simply to save clients some time going through the 
(ask/be-redirected/not-there/go-back-to-redirector-to-ask-and-refresh)
cycle, that could be perhaps be useful, but isn't critical as the client
will do the right thing. In practice you probably want some way to something
like kXR_refresh without actually opening the file (Kind of like a 
kXR_forget...) The next client that comes in will then actually trigger
the system to find the file. Andy?

> >   In practice, however, we've not found this necessary in BaBar. Artem, what
> > was the use case for this in the past?
> 
> Inhibiting federations for maintanance, preventing users from running on
> bad data.

  Personally I don't think it is the job of the data access system to prevent 
users from running on "bad" data (i.e. data deprecated because of "data 
quality" or because it has been reprocessed or whatever). That can get
extremely complicated very quickly as (a) the granularity of what is being 
rejected can be very small and (b) there may be completely legitimate reasons
for someone to access data declared "bad" (e.g. to determine how something
changed from "bad" data to "good" data). Some bookkeeping system unrelated
to the data access system should help users with making sure they run the
"right" jobs on the right data.

  There is definitely room for "policy" in terms of who can access what (and,
for example, who can cause which things to be staged), but my guess is that
it has to be a very high granularity thing (e.g. /store/R14/* files are okay,
but /store/R10/* are just too old, sorry, charlie...). The MPS stuff that
is there as you know does allow you to do some things at a per file level,
like pinning them on disk. I'd be curious to hear what others think, though.

  The only analog of the Objy system's "inhibiting federations for 
maintanance" here is taking down a data server. The "unsolicited 
response" feature (via the adminstrative interface) is the place to deal
with that.

> > > DLB should be dynamically and remotely configured not to redirect requests to
> > >   specific hosts, either forever or for specified time.
> >
> >   I think this is just done by stopping the xrootd on that affected machines.
> > The olb can be configured not to accept requests from the manager if their
> > is is no xrootd running. Is that sufficient?
> 
> No. Stopped xrootd needs to tell clients to come back in a certain time
> that admin specifies (remotely). For example, if unix-admins need to
> reboote a machine to apply a patch, we'd rather have clients wait for 10
> minutes rather than redirecting them and restaging a file somewhere else.
> So we'd need to tell redirector to hold those clients how need to access
> that host to for 10 minutes.

  Ok, I was reading your use case too literally. Again the "unsolicited 
response" stuff was foreseen to cover this class of use cases (but it is the 
xrootd which is doing this, not the olbd). We should probably go through the 
various sub-use-cases here and categorize them to make sure they are supported.

> > > Xrootd should not stop working if hpss goes down.
> >
> >   Since it was just announced that HPSS is unavailable about 10 minutes before
> > I got to writing these lines, we'll see how this goes. I'm not sure we've yet
> > really gone through an extended HPSS outage, so we'll presumably learn some
> > things this time.
> 
> If xrootd will need to stage in a file and get an error other than "file
> not existent", it should handle this gracefully, holding client for some
> time (externally (and remotely! and dynamically!) configured).

  The claim is that this is the case. This isn't really xrootd itself, though,
but the staging software itself. (If it starts to return lots of "file not
existent" messages, there isn't much xrootd can do about it.) I've not yet
looked at the log files, do we know what happened during the period of the
HPSS outage the other day? Did it handle it gracefully?

> > > When a files system on a host crashes, xrootd should automatically recover.
> > >   It should report, that FS is down.
> >
> >   How is it supposed to recognize that there is a problem with the filesystem?
> 
> it gets distinct return code from open, seek, read etc.

  Could you be more specific about which return codes it would get to know 
that it is a "filesystem problem"? (Which I read as "hardware problem", so
perhaps you should be more specific about which filesystem problems you 
mean.)

> > > DLB should be checking xrootd "health" of a data server and it's
> > >   filesystems as a part of load measure. Should report if finds something
> > >   wrong. Anything that prevents xrootd or dbl from doing it's job, like
> > >   network problems, afs troubles, should be reported.
> >
> >   Again, it isn't clear to me what exactly should be monitored. Can you
> > give examples?
> 
> If one of file system is performing significantly worth that another one,
> this should be noticed. Just recently disk on bbr-xfer05 was very slow, no
> one could noticed, only Remi did I don't know how.

  Why in the end was the disk slow?

> >   Artem, do you agree that this isn't the job of the xrootd system itself, but
> > of something like Ganglia? (Or whatever, something designed to do monitoring
> > and alarms of systems.) We shouldn't reinvent that wheel.
> 
> I don't care what does monitoring and alarming. I gave your some
> reasoning for close-coupling it with xrootd. Again: if there an
> application that provides data access, it should monitor data access
> related performance and alaram when data access has some problems.
> Note again, that we don't have any more or less convinient, not
> to say any sofisticated alarming.

  Then we should have the separate discussion about how to deal with the
alarms and monitoring. IMO all xrootd should do is report basic statistics
about itself or which it gathers in doing its job. Some other entity should 
accumulate those and implement the logic which decides whether to "report" 
something or (for example) shutdown a particular data server or whatever.

  We may be talking past each other here as you keep using words like "report",
"monitor" and "alarm". There are tools to do that. xrootd shouldn't reinvent
it, IMO. (This is sort of a policy vs mechanism discussion.)

  Please take a look at the actual statistics which xrootd can currently
provide when queried (look for the "kXR_query" section in the protocol
document). Are there other _specific_ things (_incidental_ to its normal 
operation) which xrootd could or should collect and report as part of 
those statistics? If it is something non-incidental and non-xrootd specific
some external tool is likely to be better for the job.

  Once we have gotten past that, we could then look to see what could be
done with Ganglia and alarms at that level.

> >   As to the "reporting to the administrator" part of this use case, we
> > decided to make the "alarm" mechanism external to the xrootd system. This
> > should be handled by something else (e.g. like alarms with Ganglia, say).
> > Artem, is that sufficient?
> 
> May be, but the idea is that xrootd will detect error conditions
> immediately, while any external system will need some time. When xrootd
> detects error condition, it can react to it by adjusting something, i.e.
> turning itselft off, while external system will merely notify something.
> Besides we don't really have any kind of alarm system. This is actually
> good thing for brainstorming, since we need to clelarly define what do we
> monitor and react on or what's unix-admins responsibilities. I'v always
> wondered why do we have to tell u-a about dead hosts and file systems, and
> not vice-versa.

  Well, part of the problem was the lack of something like Ganglia at SLAC.
I agree that it would be good to discuss the specific failure modes. Could 
you talk to the others at SLAC and try to make a list of things that have 
happened? (And if possible what could be a possible reaction if the same
thing happens in the xrootd/olbd system.)

> > > Reporting: dbl should be able to send messages to some other application
> > >   for further error handling.
> > > ===> Reporting error condition on timely matter is essential. It doesn't
> > >   make a lot of sence to build another monitoring system, if dbl is
> > >   already doing so.
> 
> >   I disagree. A real (complete, full) monitoring system should be used. The
> > olbd (once called "dlb") does a very limited set of things as part of its
> > load balancing job.
> 
> I am talking about a system that reacts on errors not monitors it. So what
> if a filesystem went don't at 2 am - I am less interested to recieve and
> alarm, but I'd rather see xrootd reconfigured and restarted to avoid using
> that filesystem. Another example: user job chrashes and give message "file
> not existent". This leave user wondering why. If xrootd could not only
> print this error to client and it's log, but to pass it to some
> intellingent error processing system, such system could attempt to find
> out why the file is missing and send user more detaind explanation and
> suggestions.

  Hmm, perhaps something like: 

   Here is Artem's telephone number, call him and ask him to take a look. He's
   probably up anyway.

;-)

  Let's finish the discussion of what xrootd should and should not do. Then
we can talk about interaction with users, etc.

> > > Testing usecase
> > > ---------------
> > >
> > > Dlb sensors should be able to simulate various load conditions on a host in
> > > order to test it's functionality.
> >
> >   I'm not sure how we simulate the load conditions _internal_ to the olbd in a
> > way that isn't artificial in such a way that it doesn't really test anything.
> > What you want could however presumably be accomplished by providing dummy
> > scripts to the "olb.perf" directive.
> 
> So, the scripts should be able to simul.ate the load.

  Since the output of the script is really trivial (5 numbers), I would 
hesitate to complicte the existing script by trying to foresee the full set 
of test cases someone might want _and_ implement it in the existing script
(XrdOlbMonPerf). That would be confusing.

                                   Pete

-------------------------------------------------------------------------
Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
-------------------------------------------------------------------------
Top of Message | Previous Page | Permalink
Search Archives

Advanced Options
Options

		Log In
		Get Password

		Search Archives

		Subscribe or Unsubscribe