LISTSERV 16.5 - XROOTD-L Archives

  Hi,

  At the BaBar CM2 Lyon workshop last year a number of xrootd and data 
management use cases were presented:

http://www.slac.stanford.edu/BFROOT/www/Computing/Distributed/workshops/Jun2003/artem-xrootd-usecase.txt
  http://www.slac.stanford.edu/BFROOT/www/Computing/Distributed/workshops/Jun2003/artem-datamanagement-usecase.txt

  As some of these use cases are still not properly dealt with, I'd like
to go through them here. From this thread we will synthesize the relevant
issues in particular for proceeding (finally) with the adminstrative interface,
but also in determining if there are any other missing xrootd features.

  In what follows text from Artem's original document is proceeded with "> " 
and my comments are interspersed. Some of these things are not really xrootd 
topics, but instead are dealt with by other things (e.g. the babar 
bookkeeping, etc.). I will mark those as such and we should focus on the 
xrootd specific topics here.

  I sort of expect further comments from Andy and Artem in particular, but
anyone else should feel free to jump in... Artem (and Wilko and others), now 
that you've had to deal with the xrootd system since late last fall, this
would also be a good time to bring up any operational problems or use 
cases that you didn't imagine 1.5 years ago.

  So here we go:

> Use cases for Xrootd.
> ---------------------
> 
> Some comments placed in the line with "===>"
> 
> Admin should be able to stop the server remotely (a la oostopams).

  Presumably needs the administrative interface. Hopefully we can (finally)
provide this in the near future.

> Admin should be able to audit server's (get it's state and debug info) 
>   remotely, i.e. via a call to the server.

> Admin should be able to turn on debugging remotely, i.e via a call to the 
>   server.

  Since the server can be restarted easily without crashing the clients,
the server could also be restarted with extra options in the config file.
I don't feel strongly that this has to be possible via the adminstrative
interface and don't even know how easy it would be to add this. Andy?

> Admin should be able to read server's log file(s) remotely, via a call. Log 
>   files include main log file, error messages, trace output, pid file.

  IIRC, in the original discussion we had about this some of use felt that
this would be useful, but overkill, since other tools could be used. 

> Server should be able to log it's host's and it's process' cpu and memory 
>   utilization and other usufull paramemters.

  At the time of the workshop SLAC had no useful system monitoring (despite 
having tens of millions of dollars of equipment). Since that time Yemi has 
deployed the Ganglia monitoring at SLAC. Using some external agent like 
Ganglia to monitor things like cpu and memory usage seems a better structure 
(and is what we have at SLAC and other places). Artem, are you happy with 
that?  (i.e. with no features in xrootd itself for this)

> ===>Remote administration is essential in distributed environment.

  This statement I think we all agree with. (But the monitoring issue
in the last point is separate...)

> Admin should be able to dump/load server's configuration remotely.
> ===>24/7 availability is essential. Stoping 1000+ clients for some simple 
>   tasks like reconfiguration is bad bad bad.

  Well, I think we've done pretty well at SLAC in terms of 24/7 availability.
(CNAF in particular seems to have problems, though.)

  I'm not so clear on why it is useful to dump the configuration remotely.
Artem, do you still feel strongly about this?

> Admin should be able to give a signal to dlb to rescan file system for 
>   new/gone files.

  The olb (once known as the "dlb") doesn't maintain any state on the data
servers, or have I misunderstood? I'm not sure what this means. The manager 
olbd obviously does have a cache, but as I understand it it also times
out entries older than 8 hours. 

  Andy or Fabrizio, could you explain the mechanism by which the client itself
can cause the refresh of the cache on the manager olbd?

> Xrootd, dlb should coexist on the same host with ams(es), so a host can be 
>   used for both objy and root data.
 
  As we wound up setting up things at SLAC, this turned out not to be 
necessary, although In2p3 did in the end set things up this way. (i.e. it
is possible, but not of general interest, of course.)

> ===> It's very difficult to move objy data around and not possible at all
>   without disruption of access for hours. Therefore the aim should be not
>   to phase out objy servers, but to deploy the same servers for root data,
>   considering  of cource overal load and file commitment.

  Again, we deployed the xrootd data servers at SLAC independently from the
objy ones. Now the game is to wind down objy usage and move more servers
over from "only objy" to "only root". I think Stephen will post about this.

> Load balacing should not be applied to files not yet backed up (typycally, in 
>   in user's production enviroment). (*)

  I'm not sure I understand this point. Artem, could you explain it again?

> Dlb should be able to distinguish, whether a file is closed or still
>   active (may be written into later). If the file is active, it must not
>   be staged anywhere else, even for r/o access. (*)
> 
> Data Management use cases (wrt xrootd)
> 
> Admin wants to check whether file is on disk and which host(s), and/or in 
>   hpss.
> 
> Admin wants to set a pool of hosts for readonly data.

  This is possible (in several ways), see the examples linked from the
xrootd page.

> Admin wants to set a pool of hosts for user's production data. This is 
>   totally separate from readonly hosts.

  In the end we deployed NFS for the users production data at SLAC, but 
this _is_ possible via the xrootd configurations.

> Admin wants to dynamically add or remove a host from a readonly or 
>   write pools.

  This is possible from the readonly pools. As things are configured at
SLAC (for example) removing such a server would eventually cause the
client to come back in through the redirector and the file would be
staged onto another (read-only) data server.

  For write, it is obviously much more complicted. As you know we prefer
in BaBar to do the actual writes from production jobs to the local disk
of the machine where the jobs run and then transfer the output when the
job finishes to some production buffer. (Soon to be accessed via the
'xrdcp' command instead of NFS so that we can add load balancing across
filesystems and servers.)

  If someone is actually writing to an xrootd server and then suddenly
you take the server down or machine out of the pool there really isn't
a graceful way to handle it. 

  (i.e. not sure what we can do for "write" beyond what is there now.)

  Probably Alice will learn something about writing via xrootd since I
think they intend to use it that way.

> Admin wants to disable access to certain data sets, should it need
>   so. This means tcl files should not be generated for a user jobs. ??Other
>   ways to prevent user from accessing some data?? A la inhibit?? Via Xrootd??

  There are parts of this that are really for the BaBar bookkeeping. For
xrootd the question is really something like:

  o Is it possible to prevent access to classes of files? 

  Clearly specific portions of the name space can be exported while others
excluded, but that is a very high granularity thing and dependent a bit
on how people are constructing their name space. For example BaBar has file
name spaces like:

   /store/PR/R12/AllEvents/...
   /store/SPskims/R14/14.4.0d/BSemiExcl/....

so could inhibit things like "/store/SPskims/R14" (release 14 Simulation 
Production skims) or "/store/PR", but other things like individual skims are 
more complicated. [I don't know why the release, 14.4.0d, was put before
the skim name (BSemiExcl).]

  In practice, however, we've not found this necessary in BaBar. Artem, what
was the use case for this in the past?

> Disaster use cases.
> -------------------
> 
> DLB should be dynamically and remotely configured not to redirect requests to 
>   specific hosts, either forever or for specified time.

  I think this is just done by stopping the xrootd on that affected machines.
The olb can be configured not to accept requests from the manager if their
is is no xrootd running. Is that sufficient?

> Xrootd should not stop working if hpss goes down. 

  Since it was just announced that HPSS is unavailable about 10 minutes before
I got to writing these lines, we'll see how this goes. I'm not sure we've yet
really gone through an extended HPSS outage, so we'll presumably learn some
things this time.

> When new file is created, Xrootd should not check whether its in hpss or not.
> ===> In new model, files are typically combined in large files before 
>   archiving, and each job creates unique file.

  This has to do with the interaction between the "migration" part of the mass 
storage and writing via xrootd. As we don't in general write via xrootd in
BaBar it hasn't been an important issue. The details of how it work for those
who do write via xrootd will of course depend on the back end system they
are using for migration mass storage so there isn't much I can add...

> When a data host is down, xrootd should automatically avoid this host. It 
>   should report to administrator, via some messaging mechanism, 
>   that a host is down.

  I'm not sure what you meant by "avoid" here, but if a host (i.e. a data 
server) is down its olbd will not be subscribed to the manager and it will
effectively be ignored.

  As to the "reporting to the administrator" part of this use case, we 
decided to make the "alarm" mechanism external to the xrootd system. This
should be handled by something else (e.g. like alarms with Ganglia, say). 
Artem, is that sufficient?

> When a files system on a host crashes, xrootd should automatically recover. 
>   It should report, that FS is down.

  How is it supposed to recognize that there is a problem with the filesystem?

> DLB should be checking xrootd "health" of a data server and it's
>   filesystems as a part of load measure. Should report if finds something
>   wrong. Anything that prevents xrootd or dbl from doing it's job, like
>   network problems, afs troubles, should be reported.

  Again, it isn't clear to me what exactly should be monitored. Can you
give examples? 

  Artem, do you agree that this isn't the job of the xrootd system itself, but
of something like Ganglia? (Or whatever, something designed to do monitoring
and alarms of systems.) We shouldn't reinvent that wheel.

> Reporting: dbl should be able to send messages to some other application 
>   for further error handling.
> ===> Reporting error condition on timely matter is essential. It doesn't 
>   make a lot of sence to build another monitoring system, if dbl is 
>   already doing so.

  I disagree. A real (complete, full) monitoring system should be used. The
olbd (once called "dlb") does a very limited set of things as part of its
load balancing job.

> Recovery use cases:
> -------------------
> 
> Admin should be able to close file descriptor selectively or all of it.
> ===> Used for substituting files on disk, like during conditions sweeps.

  This we've not yet dealt with. Since we (currently) only serve event data
via xrootd, we have yet to deal with any "version" issues like this for files.
As you probably know, I'm ever more of the opinion that BaBar


> Admin should be able to ask Xrootd to re-stage a file from the mass storage.
> ===> In case file is corrupted.

  I think that the actual preferred solution here is that the Admin will just
remove the offending file from disk. The system itself takes care of restaging
it automatically the next time it is requested.

> Testing usecase
> ---------------
> 
> Dlb sensors should be able to simulate various load conditions on a host in 
> order to test it's functionality.

  I'm not sure how we simulate the load conditions _internal_ to the olbd in a 
way that isn't artificial in such a way that it doesn't really test anything.
What you want could however presumably be accomplished by providing dummy
scripts to the "olb.perf" directive.

  Data management use cases:

> Selection
> ---------
> 
> Admin wants to select files based on input datasets, collectons, production 
> date (+ since/before), file names, location, production stream or physics  
> group, whether file is active/close etc.

  All of this can be done via the BaBar bookkeeping. The xrootd system provides
primarily run-time data access.

  (I'm not sure what was meant by "active/close" here, though.)

> Dataset/File Operations
> ---------------
> Admin wants to do most of file operations remotely, i.e. without logging
> into to a data server.

  This requires the administrative interface, which is currently lacking. We
should just add this.

> Admin wants to list files with attributes, i.e. a,m,c- times, file size,
> full path, file permissions,
> 
> Admin wants to check whether file ison disk, in hpss.
> Admin wants to check whether file is backed up or needs back up.
> Admin wants to pre-stage a file from hpss into disk cache, with confirmation
> Admin wants to migrate a file to hpss, with confirmation.
> Admin wants to remove a file from disk cache.
> Admin wants to copy/relocate a file to another disk cache.
> Admin wants to change file permissions.
> Admin wants to pin files on disk for specified time.
> 
> Admin wants to combine some operations into one, like: migrate+remove,
> migrate+copy, stage+chmod,

  Some of the above are presumably core functions of the adminstrative
interface or of XTNetAdmin (or successor). The HPSS ones may or may not be...

> Admin wants to check file for corruption/readability.

  For this the only thing that one can do currenty is verify that the
checksum matches that in some external catalog (e.g. in the file catalog
of the BaBar bookkeeping). The only thing xrootd should do at this point
is to allow one to ask for the checksum of a file. (Something I'm supposed
to add to XTNetAdmin since some weeks...)

> Admin wants to check what "run files"(or "job files", those produced by a 
>   single job) are included in the composite file.

  I'm not sure what this means. Does it have something to do with the fact
that we merge data produced by multiple jobs? (Before it even gets to this
system.) If so, that isn't an xrootd problem, but a BaBar (production) 
bookkeeping problem.

> ? Admin wants to exclude some "atomic" files from the composite file.

  If this is "merging data during production", then it isn't an xrootd
problem. (And in fact isn't a data adminstration problem, either.)

> Monitoring
> ----------
> 
> Admin wants to analyze access patterns on disk and hpps, + trends.

  This is external to the xrootd system. When this list of use cases was
written, SLAC was missing a real monitoring system. It now has Ganglia,
which provides much of this information. Other separate discussions on
server side monitoring, client side monitoring and simple output to the 
server log files are ongoing.

> Server Pool management
> ----------------------
> 
> Admin wants to set a pool of hosts for readonly data.

  This is possible (with various granularities in the file namespace).

> Admin wants to set a pool of hosts for user's production data. This is 
> totally separate from readonly hosts.

  What you want here is possible (see last point), even if BaBar doesn't
use xrootd for writing user-produced data. (An NFS area is used instead.)

> Admin wants to dynamically add or remove a host from a readonly or write 
> pools.

  I covered this point above. For "Read" it is easy and supported. For write
it is (clearly) more complicated.

> User production
> ---------------
> 
> User produced data (reading off PR collections in analysis environment)
>   is placed on dedicated servers with writing/reading via xrootd enabled,
>   and load balancing disabled. Files are not backed up, until merged into
>   a big file.
> 
> User's generated files can be read immediately after production, but
>   since load balancing is disabled, files can not be restaged/copied to
>   another server.
> 
> When user decides to merge small files, new dataset is created, merged
>   file is archived and pruged off production host. From that time it can
>   only be accessed in the readonly pool.
> 
> Users should have a good way to manage their jobs and datasets. If a user
>   wants to make his data publicy available, he needs to make a dataset
>   in skimdata, (which will be done at the same time with a data backup,
>   i.e. atomically).
> 
> ===> needs thinking - how readonly jobs are configured to select the data 
>      servers pool?

  None of these things are an issue for xrootd as we use an NFS area for
user data output. As of a couple of months we have a means for "publishing"
users data into both the BaBar bookkeeping and transferring it to HPSS, after
which it can be read back from the xrootd data access system.

> Debugging
> ---------
> 
> Admin wants to find out what site/farm the file was generated at and when.

  This is not an xrootd issue, but a "bookkeeping" one. (But it should be
possible with the BaBar bookkeeping.)

> Admin wants to disable access to certain data sets, should it need
>   so. This means tcl files should not be generated for a user jobs. ??Other
>   ways to prevent user from accessing some data?? A la inhibit??

  This was covered above.


                                   Pete



-------------------------------------------------------------------------
Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
-------------------------------------------------------------------------