Print

Print


Okay, so what I have, including logs from all the relevant bits is below.

I restarted all of the services (after adding the traces as requested)
at around 14:30, so all of the servers come up.
About 2 minutes later, I tried (from cephc01, using xrdcp) to copy a
file via the redirector xrootd (on cephs03). This only seems to show
up on the redirector xrootd itself - there's no record of anything in
the other logs as far as I can see.

Sam



redirector cmsd

191203 14:30:14 18957 Protocol: Primary server.202124:22@cephc01:1094 logged in.
191203 14:30:14 18957 Protocol: server.202124:22@cephc01:1094 system
ID: [log in to unmask] 1213cephs03.beowulf.cluster
=====> Routing for 10.1.50.11: local pub4 prv4
=====> Route all4: 10.1.50.11 Dest=[::10.1.50.11]:1094
191203 14:31:16 18939 Config: manager service enabled.
191203 14:31:16 18953 State: Status changed to active + staging


----
redirector xrootd

------ xrootd [log in to unmask]:1094 initialization completed.
191203 14:32:26 18970 XrootdXeq: root.218779:20@cephc01 pvt IPv4 login
191203 14:32:26 18970 root.218779:20@cephc01 XrootdResponse: sending
err 3011: No servers have read access to the file
191203 14:32:26 18970 XrootdXeq: root.218779:20@cephc01 disc 0:00:00


-----
server cmsd

------ cmsd [log in to unmask]:46427 initialization completed.
191203 14:30:14 218768 do_Login:: Primary server 218748 logged in;
data port is 1094
Config Connecting to 1 manager and 1 site.
191203 14:30:14 218729 Config: server service enabled.
191203 14:30:14 218770 State: Status changed to active + staging
191203 14:30:14 218740 ManTree: Now connected to 1 root node(s)
191203 14:30:14 218740 Protocol: Logged into cephs03


-----
server xrootd

191203 14:30:14 218765 cms_Finder: Connected to cmsd via
/tmp/cache/.olb/olbd.admin
------ xrootd protocol initialization completed.
------ xrootd [log in to unmask]:1094 initialization completed.
191203 14:30:15 218764 XrdFileCache_Manager: info Cache::Purge() Started.
191203 14:30:15 218764 XrdFileCache_Manager: info Cache::Purge()
Finished, removed 0 data files, total size 0, bytes to remove at end:
0
191203 14:35:15 218764 XrdFileCache_Manager: info Cache::Purge() Started.
191203 14:35:15 218764 XrdFileCache_Manager: info Cache::Purge()
Finished, removed 0 data files, total size 0, bytes to remove at end:
0

On Mon, 2 Dec 2019 at 19:33, Matevz Tadel <[log in to unmask]> wrote:
>
> I'd try this:
>
> redirector:
> xrootd.trace emsg login stall redirect
> xrd.trace conn
> cms.trace defer files redirect stage
>
> server:
> # For debug, to see files being searched
> # cms.trace    defer files redirect stage
> cms.trace    defer redirect stage
>
>
> You say xrdmapc shows the configured servers, right?
>
> We had some trouble with ipv4/6 at ucsd lately, clients will be redirected to
> ipv6 servers only if they come in via ipv6 to the redirector.
>
> Can you restart redirector cmsd and then (after 30sec) look at:
>
> [1131] root@xrootd /var/log/xrootd/xcacheucsd# grep Routing cmsd.log | sort
>
> =====> Routing for bcache-1.t2.ucsd.edu: local pub4 prv4 pub6 prv6
> =====> Routing for bcache-1.t2.ucsd.edu: local pub4 prv4 pub6 prv6
> =====> Routing for xcache-00.t2.ucsd.edu: local pub4 prv4 pub6 prv6
> =====> Routing for xcache-01.t2.ucsd.edu: local pub4 prv4 pub6 prv6
> =====> Routing for xcache-02.t2.ucsd.edu: local pub4 prv4 pub6 prv6
> =====> Routing for xcache-03.t2.ucsd.edu: local pub4 prv4 pub6 prv6
> =====> Routing for xcache-04.t2.ucsd.edu: local pub4 prv4 pub6 prv6
> =====> Routing for xcache-05.t2.ucsd.edu: local pub4 prv4 pub6 prv6
> =====> Routing for xcache-06.t2.ucsd.edu: local pub4 prv4 pub6 prv6
> =====> Routing for xcache-07.t2.ucsd.edu: local pub4 prv4 pub6 prv6
> =====> Routing for xcache-08.t2.ucsd.edu: local pub4 prv4 pub6 prv6
> =====> Routing for xcache-09.t2.ucsd.edu: local pub4 prv4 pub6 prv6
> =====> Routing for xcache-10.t2.ucsd.edu: local pub4 prv4
> =====> Routing for xcache-11.t2.ucsd.edu: local pub4 prv4 pub6 prv6
> =====> Routing for xrd-cache-1.ultralight.org: local pub4 prv4 pub6 prv6
> =====> Routing for xrd-cache-2.ultralight.org: local pub4 prv4 pub6 prv6
>
> Matevz
>
> On 2019-12-02 11:24, Sam Skipsey wrote:
> > No, I explicitly did that. (As I noted, there's a typo for that in the
> > example, as it uses "rw" not "r/w" , which doesn't work).
> >
> > I've tried basically every variation of stage/nostage/ r/w / r/o at
> > different parts of the network, but the manager cmsd never seems to
> > actually consider the servers (even when I've already pre-staged the
> > file it's looking for by directly talking to the server xrootd service
> > and getting it to cache).
> >
> > Sam
> >
> > On Mon, 2 Dec 2019 at 19:09, Matevz Tadel <[log in to unmask]> wrote:
> >>
> >> Hi,
> >>
> >> Before I go looking at what's wrong on the web page, have a look at this, page 21:
> >>
> >> https://indico.cern.ch/event/727208/contributions/3444604/
> >>
> >> Maybe you're missing the r/w for xrootd, stage r/o for cmsd trick?
> >>
> >> Cheers,
> >> Matevz
> >>
> >> On 2019-11-28 08:22, Sam Skipsey wrote:
> >>> Hello everyone,
> >>>
> >>> So, I have another question, working entirely from the documentation
> >>> on xrootd.org
> >>>
> >>> In the documentation for cache configuration, there's an example of
> >>> how to set up a cluster of disk caching proxies:
> >>>
> >>> https://xrootd.slac.stanford.edu/doc/dev410/pss_config.pdf [page 38,
> >>> you can't copy it because weirdly its an image]
> >>>
> >>> I'm following that exactly (except for fixing the typo where the
> >>> example has an export using "rw" and not "r/w" as an option), and,
> >>> well, it just doesn't seem to work.
> >>>
> >>> If I talk directly to the server that the proxies talk to: I can get a file.
> >>> If I talk to an individual proxy: I can also get a file (and it is cached)
> >>> If I talk to the *redirector*, I get, with debugging on "Open has
> >>> returned with status [ERROR] Server responded with an error: [3011] No
> >>> servers have read access to the file"
> >>>
> >>> The redirector logs show that the cmsd on the proxy logs in (and is
> >>> listed as a "server" in its list of servers), and the proxy cmsd logs
> >>> also show that it happily registers to the redirector.
> >>>
> >>> How do I debug this?
> >>>
> >>> I've already tried adding and removing options to the various exports,
> >>> making sure that all the relevant ports are open, etc.
> >>>
> >>> Sam
> >>>
> >>> ########################################################################
> >>> Use REPLY-ALL to reply to list
> >>>
> >>> To unsubscribe from the XROOTD-L list, click the following link:
> >>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
> >>>
> >>
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1