Okay, so what I have, including logs from all the relevant bits is below. I restarted all of the services (after adding the traces as requested) at around 14:30, so all of the servers come up. About 2 minutes later, I tried (from cephc01, using xrdcp) to copy a file via the redirector xrootd (on cephs03). This only seems to show up on the redirector xrootd itself - there's no record of anything in the other logs as far as I can see. Sam redirector cmsd 191203 14:30:14 18957 Protocol: Primary server.202124:22@cephc01:1094 logged in. 191203 14:30:14 18957 Protocol: server.202124:22@cephc01:1094 system ID: [log in to unmask] 1213cephs03.beowulf.cluster =====> Routing for 10.1.50.11: local pub4 prv4 =====> Route all4: 10.1.50.11 Dest=[::10.1.50.11]:1094 191203 14:31:16 18939 Config: manager service enabled. 191203 14:31:16 18953 State: Status changed to active + staging ---- redirector xrootd ------ xrootd [log in to unmask]:1094 initialization completed. 191203 14:32:26 18970 XrootdXeq: root.218779:20@cephc01 pvt IPv4 login 191203 14:32:26 18970 root.218779:20@cephc01 XrootdResponse: sending err 3011: No servers have read access to the file 191203 14:32:26 18970 XrootdXeq: root.218779:20@cephc01 disc 0:00:00 ----- server cmsd ------ cmsd [log in to unmask]:46427 initialization completed. 191203 14:30:14 218768 do_Login:: Primary server 218748 logged in; data port is 1094 Config Connecting to 1 manager and 1 site. 191203 14:30:14 218729 Config: server service enabled. 191203 14:30:14 218770 State: Status changed to active + staging 191203 14:30:14 218740 ManTree: Now connected to 1 root node(s) 191203 14:30:14 218740 Protocol: Logged into cephs03 ----- server xrootd 191203 14:30:14 218765 cms_Finder: Connected to cmsd via /tmp/cache/.olb/olbd.admin ------ xrootd protocol initialization completed. ------ xrootd [log in to unmask]:1094 initialization completed. 191203 14:30:15 218764 XrdFileCache_Manager: info Cache::Purge() Started. 191203 14:30:15 218764 XrdFileCache_Manager: info Cache::Purge() Finished, removed 0 data files, total size 0, bytes to remove at end: 0 191203 14:35:15 218764 XrdFileCache_Manager: info Cache::Purge() Started. 191203 14:35:15 218764 XrdFileCache_Manager: info Cache::Purge() Finished, removed 0 data files, total size 0, bytes to remove at end: 0 On Mon, 2 Dec 2019 at 19:33, Matevz Tadel <[log in to unmask]> wrote: > > I'd try this: > > redirector: > xrootd.trace emsg login stall redirect > xrd.trace conn > cms.trace defer files redirect stage > > server: > # For debug, to see files being searched > # cms.trace defer files redirect stage > cms.trace defer redirect stage > > > You say xrdmapc shows the configured servers, right? > > We had some trouble with ipv4/6 at ucsd lately, clients will be redirected to > ipv6 servers only if they come in via ipv6 to the redirector. > > Can you restart redirector cmsd and then (after 30sec) look at: > > [1131] root@xrootd /var/log/xrootd/xcacheucsd# grep Routing cmsd.log | sort > > =====> Routing for bcache-1.t2.ucsd.edu: local pub4 prv4 pub6 prv6 > =====> Routing for bcache-1.t2.ucsd.edu: local pub4 prv4 pub6 prv6 > =====> Routing for xcache-00.t2.ucsd.edu: local pub4 prv4 pub6 prv6 > =====> Routing for xcache-01.t2.ucsd.edu: local pub4 prv4 pub6 prv6 > =====> Routing for xcache-02.t2.ucsd.edu: local pub4 prv4 pub6 prv6 > =====> Routing for xcache-03.t2.ucsd.edu: local pub4 prv4 pub6 prv6 > =====> Routing for xcache-04.t2.ucsd.edu: local pub4 prv4 pub6 prv6 > =====> Routing for xcache-05.t2.ucsd.edu: local pub4 prv4 pub6 prv6 > =====> Routing for xcache-06.t2.ucsd.edu: local pub4 prv4 pub6 prv6 > =====> Routing for xcache-07.t2.ucsd.edu: local pub4 prv4 pub6 prv6 > =====> Routing for xcache-08.t2.ucsd.edu: local pub4 prv4 pub6 prv6 > =====> Routing for xcache-09.t2.ucsd.edu: local pub4 prv4 pub6 prv6 > =====> Routing for xcache-10.t2.ucsd.edu: local pub4 prv4 > =====> Routing for xcache-11.t2.ucsd.edu: local pub4 prv4 pub6 prv6 > =====> Routing for xrd-cache-1.ultralight.org: local pub4 prv4 pub6 prv6 > =====> Routing for xrd-cache-2.ultralight.org: local pub4 prv4 pub6 prv6 > > Matevz > > On 2019-12-02 11:24, Sam Skipsey wrote: > > No, I explicitly did that. (As I noted, there's a typo for that in the > > example, as it uses "rw" not "r/w" , which doesn't work). > > > > I've tried basically every variation of stage/nostage/ r/w / r/o at > > different parts of the network, but the manager cmsd never seems to > > actually consider the servers (even when I've already pre-staged the > > file it's looking for by directly talking to the server xrootd service > > and getting it to cache). > > > > Sam > > > > On Mon, 2 Dec 2019 at 19:09, Matevz Tadel <[log in to unmask]> wrote: > >> > >> Hi, > >> > >> Before I go looking at what's wrong on the web page, have a look at this, page 21: > >> > >> https://indico.cern.ch/event/727208/contributions/3444604/ > >> > >> Maybe you're missing the r/w for xrootd, stage r/o for cmsd trick? > >> > >> Cheers, > >> Matevz > >> > >> On 2019-11-28 08:22, Sam Skipsey wrote: > >>> Hello everyone, > >>> > >>> So, I have another question, working entirely from the documentation > >>> on xrootd.org > >>> > >>> In the documentation for cache configuration, there's an example of > >>> how to set up a cluster of disk caching proxies: > >>> > >>> https://xrootd.slac.stanford.edu/doc/dev410/pss_config.pdf [page 38, > >>> you can't copy it because weirdly its an image] > >>> > >>> I'm following that exactly (except for fixing the typo where the > >>> example has an export using "rw" and not "r/w" as an option), and, > >>> well, it just doesn't seem to work. > >>> > >>> If I talk directly to the server that the proxies talk to: I can get a file. > >>> If I talk to an individual proxy: I can also get a file (and it is cached) > >>> If I talk to the *redirector*, I get, with debugging on "Open has > >>> returned with status [ERROR] Server responded with an error: [3011] No > >>> servers have read access to the file" > >>> > >>> The redirector logs show that the cmsd on the proxy logs in (and is > >>> listed as a "server" in its list of servers), and the proxy cmsd logs > >>> also show that it happily registers to the redirector. > >>> > >>> How do I debug this? > >>> > >>> I've already tried adding and removing options to the various exports, > >>> making sure that all the relevant ports are open, etc. > >>> > >>> Sam > >>> > >>> ######################################################################## > >>> Use REPLY-ALL to reply to list > >>> > >>> To unsubscribe from the XROOTD-L list, click the following link: > >>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 > >>> > >> > ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1