Print

Print


Hi Pete,

>   What was the "local admin interface"?
That is a Unix named socket that exists on the local host which can only
be used by the same user that xrootd is running under.

>   I assume you mean NFS export, here? (As opposed to xrootd export ;-) It
> wasn't clear to me that people wanted to do that for /var partitions on the
> data movers, but it would eliminate Artem's problem.
Not really, you can just as easily export it via xrood, can't you :-)

>   Copying the file to a 2nd server every time one is opened sounds like a
> lot of extra traffic. At a minimum it would double the I/O out of the data
> servers (and if the implementation wasn't done in a sensible way it could
> do more than that).
It's relative. At some point someone decides the tradeoffs and that's
somewhat subjective.

>   If the server has really/truly just gone down (something that shouldn't
> happen often) it is probably okay to simply delay the client. If it is a load
> issue, you can probably invent other ways to force the file replication only
> when it matters (i.e. when some server is over the load threshhold). Anybody
> for a _2nd_ xrootd/olb system running on the same servers (without the
> load threshold) dedicated just to file replication?
I don't think you need two to do the job and would be very confusing if
you did it that way.

>   Doesn't solve the problem, see next point. Also exporting /opt (presumably
> via NFS, right?) from all data servers is also somewhat ugly.
No, xrootd.

>   I really dislike these little "turd" files (feels like VMS). Can't it just
> make a full copy in memory as it is read in the first time and allow that to
> be dumped via the adminstrative interface?
That's somewhat overkill.

>   If a data server olbd goes away and then comes back, does the manager
> olbd clear all entries in its cache from that server olbd?
Ah, no. Since it's very expensive (latency-wise) to rebuild the cache, the
manager forces newly rejoined server to refresh the cache for only those
entries that are actually in the cache at the time the server rejoined.

>   Actually, what happens currently when a file is purged off disk because of
> the staging? (i.e. because space was needed to stage in some other file) I
> guess it is unlikely that this will happen within 8 hours of a file being
> used, but theoretically what happens? The next client (assuming that is also
> within the 8 hours) is redirected to where it was, doesn't finds it and causes
> the refresh? i.e. the purging system doesn't actually propagate any info to the
> olbd cache? (Which would be fine, the system is robust without that...)
Yes. It's unlikely to cause any problems except an additional bounce of
the client. That's why keeping the information current is low on the
priority list. However, we have plans to do so.

>    This would be an admin function to the xrootd on the redirector, correct?
> Would it not be rather "hold all requests which would be redirected to
> server X for 10 minutes"?
Yes, something like that. Though, as I said, it isn't determinstic
solution. The manager doesn't always know that is should be directing
clients to that server.

Andy