Print

Print


Hello all,

i'am new to xrootd and evaluate a test setup in a 10GBit network.

During a looped copy job of a 1GB large test file xrootd 4.8.1 stalls 
every 4 to 7 copy jobs.

The copying is done between RAM-Network-RAM to exclude disk i/o. 
Sometimes xrootd comes back, sometimes the copy job has to be killed. 
Restarting the xrootd job reverts everything to normal until the stall 
reappears.

I would expect xrootd not to stall even under such circumstances. But i 
agree this is a somewhat artificial usecase.


Best

Heiko


xrootd.cf, Ver 4.8.1:

all.export /xrootd

set xrdr=REDIRECTOR
set inventory=/var/log/xrootd/inventory
all.manager $(xrdr):3121

if $(xrdr) && named cns
       all.export $(inventory)
       xrd.port 1095
else if $(xrdr)
       all.role manager
       ofs.forward 3way $(xrdr):1095 mv rm rmdir trunc
       xrd.port 1094
else
       all.role server
       ofs.notify closew create mkdir mv rm rmdir trunc | 
/usr/bin/XrdCnsd -d -D 2 -i 90 -b $(xrdr):1095:$(inventory)
       ofs.notifymsg create $TID create $FMODE $LFN?$CGI
       ofs.notifymsg closew $TID closew $LFN $FSIZE
fi


The brute force test:

for ((i=0;i<=100;i++));do rm -f /mnt/ramdisk/test.dat; xrdcp -d 3  -v 
root://REDIRECTOR//xrootd/test.dat /mnt/ramdisk/test.dat;  rm -f 
/mnt/ramdisk/test.dat; sleep 1; done


xrdcp debug output:

[2018-04-04 14:42:59.221301 +0200][Debug  ][File              ] 
[0x24e27a0@file://localhost/mnt/ramdisk/test.dat?oss.asize=1073741824] 
Sending a write command for handle 0xb to localhost
[2018-04-04 14:42:59.228163 +0200][Dump   ][Utility           ] URL: 
file://localhost/mnt/ramdisk/test.dat?oss.asize=1073741824
[2018-04-04 14:42:59.228163 +0200][Dump   ][Utility           ] 
Protocol:  file
[2018-04-04 14:42:59.228163 +0200][Dump   ][Utility           ] User Name:
[2018-04-04 14:42:59.228163 +0200][Dump   ][Utility           ] Password:
[2018-04-04 14:42:59.228163 +0200][Dump   ][Utility           ] Host 
Name: localhost
[2018-04-04 14:42:59.228163 +0200][Dump   ][Utility           ] 
Port:      1094
[2018-04-04 14:42:59.228163 +0200][Dump   ][Utility           ] 
Path:      /mnt/ramdisk/test.dat
[2018-04-04 14:42:59.228229 +0200][Debug  ][File              ] 
[0x24dd0d0@root://REDIRECTOR:1094//xrootd/test.dat] Sending a read 
command for handle 0x0 to 192.168.16.120:1094
[2018-04-04 14:42:59.228233 +0200][Dump   ][File              ] 
[0x24e27a0@file://localhost/mnt/ramdisk/test.dat?oss.asize=1073741824] 
Got state response for message kXR_write (handle: 0x0b000000, offset: 
503316480, size: 16777216)
[2018-04-04 14:42:59.228254 +0200][Dump   ][XRootD            ] 
[192.168.16.120:1094] Sending message kXR_read (handle: 0x00000000, 
offset: 570425344, size: 16777216)
[2018-04-04 14:42:59.228272 +0200][Dump   ][PostMaster        ] 
[192.168.16.120:1094 #0] Sending message kXR_read (handle: 0x00000000, 
offset: 570425344, size: 16777216) (0x24dd9e0) through substream 0 
expecting answer at 0
[2018-04-04 14:42:59.228305 +0200][Dump   ][AsyncSock         ] 
[192.168.16.120:1094 #0.0] Wrote a message: kXR_read (handle: 
0x00000000, offset: 570425344, size: 16777216) (0x24dd9e0), 32 bytes
[2018-04-04 14:42:59.228329 +0200][Dump   ][AsyncSock         ] 
[192.168.16.120:1094 #0.0] Successfully sent message: kXR_read (handle: 
0x00000000, offset: 570425344, size: 16777216) (0x24dd9e0).
[2018-04-04 14:42:59.228340 +0200][Dump   ][XRootD            ] 
[192.168.16.120:1094] Message kXR_read (handle: 0x00000000, offset: 
570425344, size: 16777216) has been successfully sent.
[2018-04-04 14:42:59.228353 +0200][Dump   ][PostMaster        ] 
[192.168.16.120:1094 #0.0] All messages consumed, disable uplink
[2018-04-04 14:42:59.750894 +0200][Dump   ][TaskMgr           ] Running 
task: "FileTimer task"
[2018-04-04 14:42:59.750934 +0200][Dump   ][TaskMgr           ] Will 
rerun task "FileTimer task" at [2018-04-04 14:43:14 +0200]
[2018-04-04 14:43:13.464015 +0200][Dump   ][XRootDTransport   ] 
[REDIRECTOR:1094 #0.0] Stream inactive since 15 seconds, TTL: 1200, 
allocated SIDs: 0, open files: 0
[2018-04-04 14:43:13.464039 +0200][Dump   ][XRootDTransport   ] 
[REDIRECTOR:1094 #0.0] Stream inactive since 15 seconds, stream timeout: 
60, allocated SIDs: 0, wait barrier: 2018-04-04 14:42:58 +0200
[2018-04-04 14:43:13.751694 +0200][Dump   ][TaskMgr           ] Running 
task: "TickGeneratorTask for: REDIRECTOR:1094"
[2018-04-04 14:43:13.751737 +0200][Dump   ][TaskMgr           ] Will 
rerun task "TickGeneratorTask for: REDIRECTOR:1094" at [2018-04-04 
14:43:28 +0200]
[2018-04-04 14:43:13.751753 +0200][Dump   ][TaskMgr           ] Running 
task: "TickGeneratorTask for: 192.168.16.120:1094"
[2018-04-04 14:43:13.751764 +0200][Dump   ][TaskMgr           ] Will 
rerun task "TickGeneratorTask for: 192.168.16.120:1094" at [2018-04-04 
14:43:28 +0200]
[2018-04-04 14:43:14.751830 +0200][Dump   ][TaskMgr           ] Running 
task: "FileTimer task"
[2018-04-04 14:43:14.751849 +0200][Dump   ][TaskMgr           ] Will 
rerun task "FileTimer task" at [2018-04-04 14:43:29 +0200]
[2018-04-04 14:43:28.752586 +0200][Dump   ][TaskMgr           ] Running 
task: "TickGeneratorTask for: REDIRECTOR:1094"
[2018-04-04 14:43:28.752656 +0200][Dump   ][TaskMgr           ] Will 
rerun task "TickGeneratorTask for: REDIRECTOR:1094" at [2018-04-04 
14:43:43 +0200]
[2018-04-04 14:43:28.752691 +0200][Dump   ][TaskMgr           ] Running 
task: "TickGeneratorTask for: 192.168.16.120:1094"
[2018-04-04 14:43:28.752727 +0200][Dump   ][TaskMgr           ] Will 
rerun task "TickGeneratorTask for: 192.168.16.120:1094" at [2018-04-04 
14:43:43 +0200]
[2018-04-04 14:43:28.785950 +0200][Dump   ][XRootDTransport   ] 
[REDIRECTOR:1094 #0.0] Stream inactive since 30 seconds, TTL: 1200, 
allocated SIDs: 0, open files: 0
[2018-04-04 14:43:28.786026 +0200][Dump   ][XRootDTransport   ] 
[REDIRECTOR:1094 #0.0] Stream inactive since 30 seconds, stream timeout: 
60, allocated SIDs: 0, wait barrier: 2018-04-04 14:42:58 +0200
[2018-04-04 14:43:29.752822 +0200][Dump   ][TaskMgr           ] Running 
task: "FileTimer task"
[2018-04-04 14:43:29.752892 +0200][Dump   ][TaskMgr           ] Will 
rerun task "FileTimer task" at [2018-04-04 14:43:44 +0200]
[2018-04-04 14:43:40.846051 +0200][Dump   ][XRootDTransport   ] 
[192.168.16.120:1094 #0.0] Stream inactive since 15 seconds, TTL: 300, 
allocated SIDs: 4, open files: 1
[2018-04-04 14:43:40.846125 +0200][Dump   ][XRootDTransport   ] 
[192.168.16.120:1094 #0.0] Stream inactive since 15 seconds, stream 
timeout: 60, allocated SIDs: 4, wait barrier: 2018-04-04 14:42:59 +0200
[2018-04-04 14:43:43.753676 +0200][Dump   ][TaskMgr           ] Running 
task: "TickGeneratorTask for: REDIRECTOR:1094"
[2018-04-04 14:43:43.753760 +0200][Dump   ][TaskMgr           ] Will 
rerun task "TickGeneratorTask for: REDIRECTOR:1094" at [2018-04-04 
14:43:58 +0200]
[2018-04-04 14:43:43.753775 +0200][Dump   ][TaskMgr           ] Running 
task: "TickGeneratorTask for: 192.168.16.120:1094"
[2018-04-04 14:43:43.753786 +0200][Dump   ][TaskMgr           ] Will 
rerun task "TickGeneratorTask for: 192.168.16.120:1094" at [2018-04-04 
14:43:58 +0200]
[2018-04-04 14:43:43.854343 +0200][Dump   ][XRootDTransport   ] 
[REDIRECTOR:1094 #0.0] Stream inactive since 45 seconds, TTL: 1200, 
allocated SIDs: 0, open files: 0
[2018-04-04 14:43:43.854399 +0200][Dump   ][XRootDTransport   ] 
[REDIRECTOR:1094 #0.0] Stream inactive since 45 seconds, stream timeout: 
60, allocated SIDs: 0, wait barrier: 2018-04-04 14:42:58 +0200
[2018-04-04 14:43:44.753880 +0200][Dump   ][TaskMgr           ] Running 
task: "FileTimer task"
[2018-04-04 14:43:44.753958 +0200][Dump   ][TaskMgr           ] Will 
rerun task "FileTimer task" at [2018-04-04 14:43:59 +0200]



-- 
-----------------------------------------------------------------------
Heiko Schröter
Institute of Environmental Physics (IUP)   phone: ++49-(0)421-218-62092
Institute of Remote Sensing (IFE)          fax:   ++49-(0)421-218-62070
University of Bremen (FB1)
P.O. Box 330440               email:  [log in to unmask]
Otto-Hahn-Allee 1
28359 Bremen
Germany
-----------------------------------------------------------------------

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1