Print

Print


Hi Petr,

I am looking into this issue. I have seen similar issue with 5.0.3 as well. Now with 5.1.0rc4 at PRAGUE and SLAC (and SLAC smoke test client), what I see seem to indicate two problems, and they seems to happen even without TPC but all seems to relate to TLS:

1). xrdcp -f SLAC PRAGUE (no TPC) often get :

[2021-01-02 13:17:29.783436 -0800][Dump   ][File              ] [0x1047a90@root://griddev03.slac.stanford.edu:2094//xrootd/atlas/tpctest/wei.yang.junk01?xrdcl.requuid=c117a90e-09f3-4816-9eda-11dbee617ddc] File state error encountered. Message kXR_read (handle: 0x00000000, offset: 25165824, size: 8388608) returned with [FATAL] TLS error: resource temporarily unavailable

But it only happens when the copy is between these two endpoints (while "xrdcp -f SLAC /dev/null" works fine).

2) the same xrdcp -f SLAC PRAGUE (no TPC) can also hang at PRAGUE side:

[2021-01-02 13:25:41.534359 -0800][Debug  ][PostMaster        ] Creating new channel to: root://golias100.farm.particle.cz:1094
[2021-01-02 13:25:41.534395 -0800][Debug  ][PostMaster        ] [golias100.farm.particle.cz:1094] Stream parameters: Network Stack: IPAuto, Connection Window: 120, ConnectionRetry: 5, Stream Error Window: 1800
[2021-01-02 13:25:41.534502 -0800][Debug  ][TaskMgr           ] Registering task: "TickGeneratorTask for: root://golias100.farm.particle.cz:1094" to be run at: [2021-01-02 13:25:56 -0800]
[2021-01-02 13:25:41.534535 -0800][Dump   ][PostMaster        ] [golias100.farm.particle.cz:1094] Sending message kXR_stat (path: /dpm/farm.particle.cz/home/dteam/tpc-test/wei.yang.junk01, flags: none) (0x8c5620) through substream 0 expecting answer at 0
[2021-01-02 13:25:41.535374 -0800][Debug  ][PostMaster        ] [golias100.farm.particle.cz:1094] Found 2 address(es): [::ffff:147.231.25.100]:1094, [::ffff:172.16.0.100]:1094
[2021-01-02 13:25:41.535483 -0800][Debug  ][AsyncSock         ] [golias100.farm.particle.cz:1094.0] Attempting connection to [::ffff:172.16.0.100]:1094
[2021-01-02 13:25:41.535533 -0800][Debug  ][Poller            ] Adding socket 0x8cb6e0 to the poller
[2021-01-02 13:25:42.534202 -0800][Dump   ][TaskMgr           ] Running task: "FileTimer task"
[2021-01-02 13:25:42.534263 -0800][Dump   ][TaskMgr           ] Will rerun task "FileTimer task" at [2021-01-02 13:25:57 -0800]
[2021-01-02 13:25:56.535650 -0800][Dump   ][TaskMgr           ] Running task: "TickGeneratorTask for: root://golias100.farm.particle.cz:1094"
[2021-01-02 13:25:56.535726 -0800][Dump   ][TaskMgr           ] Will rerun task "TickGeneratorTask for: root://golias100.farm.particle.cz:1094" at [2021-01-02 13:26:11 -0800]
[2021-01-02 13:25:57.535833 -0800][Dump   ][TaskMgr           ] Running task: "FileTimer task"
[2021-01-02 13:25:57.535902 -0800][Dump   ][TaskMgr           ] Will rerun task "FileTimer task" at [2021-01-02 13:26:12 -0800]

Is this load related? 

I also saw a few successful TPC from SLAC to PRAGUE but the success rate is low. I think I will start with the first problem.

regards,
--
Wei Yang  |  mailto:[log in to unmask]  |  650-926-3338(O)

On 12/31/20, 12:19 AM, "mailto:[log in to unmask] on behalf of Petr Vokac" <mailto:[log in to unmask] on behalf of mailto:[log in to unmask]> wrote:

Hi,

xroot-tpc with TLS fails quite often (~ 10%?) with DPM on top of XRootD 5.1.0 release candidate 4. Is this just some DPM specific feature or more general problem within XRootD? (I saw also failures with DPM on top of XRootD 5.0.3 and that's why I moved to 5.1.0 to see if latest version works)
...
201231 08:36:17 238312 vokac.30093:70@ui1 XrdPoll: Poller 2 removing FD 70
201231 08:36:17 238917 XrdPoll: Poller 2 enabled vokac.30093:70@ui1
...
201231 08:36:18 239380 vokac.30093:70@ui1 XrdPoll: Poller 2 async disabling link 70
201231 08:36:18 239380 vokac.30093:70@ui1 XrdPoll: Link 70 terminating: ended by vokac.30093:107@ui1
201231 08:36:18 238309 XrdSched: running vokac.30093:70@ui1 inq=0
201231 08:36:18 238309 vokac.30093:70@ui1 TLS_Shutdown: Doing fast shutdown.
201231 08:36:18 238309 XrdTLS: vokac.30093:70@ui1 FD 70 TLS shutdown failed; broken pipe.
201231 08:36:18 238309 XrootdXeq: vokac.30093:70@ui1 disc 0:00:02 (ended by vokac.30093:107@ui1)
201231 08:36:18 238309 vokac.30093:70@ui1 XrootdFile: closing w /dpm/farm.particle.cz/home/atlas/atlasdatadisk/SAM/xta
201231 08:36:18 238309 vokac.30093:70@ui1 ofs_close: use=1 fn=/dpm/farm.particle.cz/home/atlas/atlasdatadisk/SAM/xta
201231 08:36:18 106576 vokac.30093:70@ui1 ofs_Xeq: TPC job 43: ended with rc=0
201231 08:36:18 238731 XrootdDoIt: vokac.30093:70@ui1 sync async callback
201231 08:36:18 238731 XrootdsendResp: vokac.30093:70@ui1 sync async resp aborted; user gone.
201231 08:36:18 238309 vokac.30093:70@ui1 dpmoss_Close: doneWriting
201231 08:36:18 238370 sysXrdHttp:  rc:91 got hdr line: remoteclientdn: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=vokac/CN=610071/CN=Petr Vokac
201231 08:36:18 238309 vokac.30093:70@ui1 dpmoss_Close: return 0
201231 08:36:18 238309 vokac.30093:70@ui1 XrdPoll: Poller 2 removing FD 70
201231 08:36:18 238309 vokac.30093:70@ui1 XrdPoll: FD 70 detached from poller 2; num=24
201231 08:36:18 239380 vokac.30093:70@ui1 XrdLink: Terminate complete -32

in the attachment there is full XRootD server log from time period of this transfer.
xrdcopy --cksum adler32:source --tpc delegate only root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz/home/atlas/atlasdatadisk/SAM/vokac.src/1G/19 roots://golias100.farm.particle.cz:1094//dpm/farm.particle.cz/home/atlas/atlasdatadisk/SAM/xtg

Petr

On 12/19/20 9:20 PM, Michal Simon wrote:
Dear All,

    The XRootD 5.1.0 release candidate 4 has been tagged:
    https://github.com/xrootd/xrootd/releases/tag/v5.1.0-rc4

    The source RPM is available at:
    https://xrootd.cern.ch/sw/releases/5.1.0-rc4/rpms/source

    The RPMs are available in the XRootD testing repositories:
    http://xrootd.org/binaries/xrootd-testing-slc7.repo
    http://xrootd.org/binaries/xrootd-testing-slc8.repo

Kind regards,
    Michal


Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 


Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 


########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1