Hi Matevz,
Some observations....
1) I think the issue with multiple caching with the "stage" option is that
when the cmsd looks up the file it is using the standard stat() call with
whatever file path it was given. I doubt it will find the file doing it
that way because the storage layout is not what is expected by the oss.
So, you need a special stat() to look in the right place and return
existence or not. You do this by coding your special stat plug-in for the
oss (see XrdOssStatInfo.hh for the plug-in interface and XrdOssSIgpfsT.cc
as an implementation example). Then in the config file for the cmsd's that
run on the proxy servers say:
oss.statlib <shared library that has the function>
Note that this can be packaged right into libXrdFileCache.so.
2) Please do not suffix standard plugin libraries with the '-4' suffix.
That is the version name and is automatically added by the plugin loader.
By using the suffix you've disabled name versioning and the config file
will become outdated the moment we release a new version of the stuff.
The standard ones are libXrdPss.so and libXrdFileCache.so (among others
but those two are relevant to you).
3) There is no such thing as libXrdOfs.so (well not in 4.1 anyway). You
never needed to specify this at all and it won't work in 4.1.
4) You really don't need the stagecmd at all. It is not required when the
role is manager because the manager never touches disk space. In
your case it is not required for role server because there you replace the
oss with the pss that doesn't really care about the stagecmd at all. The
cmsd's, even though they use the standard oss, disable the requirement for
the stagecmd because they will never stage even when the stage option is
specified.
Andy
On Fri, 5 Dec 2014, Matevz Tadel wrote:
> Hi,
>
> Alja and I have been testing a caching-proxy cluster, the idea is that there
> are several independent caching-proxies all reporting to a common redirector
> so that one gets more disk space, better performance and redundancy. Here are
> the scripts we've been using:
>
> http://uaf-2.t2.ucsd.edu/~matevz/xrd/proxy-cluster/
>
> pooxy-klus.cfg is the config and start-pooxy.sh the startup script.
> cabinet-10-10-10 is the redirector.
>
> All servers export /store with the stage option and there is a trivial
> oss.stagecmd /opt/stage-fake.pl that basically just touches the file,
> assuming that proxy will then pull in the parts of the file that are actually
> needed.
>
> On proxy servers, we configure xrootd to use
> ofs.osslib /opt/xrootd/lib64/libXrdPss-4.so
> pss.origin xrootd.t2.ucsd.edu:1094
> pss.cachelib libXrdFileCache-4.so
> while cmsd just gets
> xrootd.fslib /opt/xrootd/lib64/libXrdOfs-4.so
> so that it is able to look what files already exist on the disk.
>
> The whole thing works ... but apparently stage option is too strong. When
> opening a file that exists on proxy machine A it also happens that the client
> gets redirected to machine X that does not have it -- so it has to be pulled
> in from remote location one more time.
>
> In particular, this happens after the cluster restart. xrdcp is used to
> access the files. xrdcp does stat before the open. both are redirected
> randomly the first time after restart ... after that the server that was
> selected for open keeps getting selected consistently.
>
> Any ideas what we could do? Or should we be looking for a completely
> different solution?
>
> Could it be proxy cmsds are not seeing the files right? I tried bumping trace
> to debug but there is no info there about found/not found cmsd queries.
>
> Cheers,
> Matevz
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-DEV list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
|