Print

Print


Hi Raja,

Can you clarify whether your batch jobs read from Xcache or not? If so,  how does the Xcache setup? In addition to cache size, one important parameter in Xcache is “pfc.blocksize”. You may want to set it to something large, like 64MB (close or equal to CEPH data block size). For explanation of “pfc.blocksize”, see https://xrootd.slac.stanford.edu/doc/dev50/pss_config.htm#_Toc42114746.

If LHCb uses root data format, then the data reading is not sequential. Without a cache, or if the cache keeps on thrashing data (because the size is too small), then adjacent reads will cause CEPH to delivery the same data block multiple times. Think of the following scenario:

read(offset=0, len=10)
read(offset=11, len=10)

Without a cache, CEPH will fetch the first 64MB data for the first read, and then fetch the same 64MB again for the second read. The frontend cache will smooth out these, so CEPH will only need to deliver the 64MB block once.

If I have to guess, the rest of the issue you reported are likely related to xrootd being overwhelmed.

regards,
--
Wei Yang  |  [log in to unmask]<mailto:[log in to unmask]>  |  650-926-3338(O)

On 7/22/20, 2:59 PM, "Raja Nandakumar" <[log in to unmask]<mailto:[log in to unmask]>> wrote:


Dear @bbockelm<https://github.com/bbockelm> and @wyang007<https://github.com/wyang007>

Many thanks for your responses. To start with, I am the RAL contact who is looking at this with support from the ECHO team. My expertise is more in the field of some of the LHCb software, though I will be happy to feed back suggestions or include the relevant RAL administrators in this discussion as needed. Unfortunately there is not much expertise on xrootd within RAL and so we need help from you in solving this issue!

To answer @wyang007<https://github.com/wyang007> my jobs are running within RAL on dedicated worker nodes which replicate the production setup. But these machines run only test jobs and we can tweak the settings here to see if the issue gets fixed. The cache size is a few 100 GB in size. We have this test system because LHCb have observed this issue on a large scale and we are trying to solve it.

@bbockelm<https://github.com/bbockelm> For my understanding, why doesn't xrootd reopen the file? We see quite a few "socket errors" in general but this is never (apparently) fatal. The socket closes and then reopens. What happens that this process is terminated in such cases? For example, we see

[2020-07-22 20:42:20.822408 +0000][Error  ][AsyncSock         ] [xrootd.echo.stfc.ac.uk:1094 #0.0] Socket error while handshaking: [ERROR] Socket timeout

[2020-07-22 20:42:20.822418 +0000][Debug  ][AsyncSock         ] [xrootd.echo.stfc.ac.uk:1094 #0.0] Closing the socket

[2020-07-22 20:42:20.822431 +0000][Debug  ][Poller            ] <[::ffff:172.28.5.33]:60726><--><[::ffff:172.28.1.1]:1094> Removing socket from the poller

[2020-07-22 20:42:20.822510 +0000][Error  ][PostMaster        ] [xrootd.echo.stfc.ac.uk:1094 #0] elapsed = 108, pConnectionWindow = 120 seconds.

[2020-07-22 20:42:20.822532 +0000][Info   ][PostMaster        ] [xrootd.echo.stfc.ac.uk:1094 #0] Attempting reconnection in 12 seconds.

[2020-07-22 20:42:20.822546 +0000][Debug  ][TaskMgr           ] Registering task: "StreamConnectorTask for xrootd.echo.stfc.ac.uk:1094 #0" to be run at: [2020-07-22 20:

42:32 +0000]

[2020-07-22 20:42:26.822614 +0000][Dump   ][TaskMgr           ] Running task: "FileTimer task"

[2020-07-22 20:42:26.822672 +0000][Dump   ][File              ] [0x2ce62820@root://xrootd.echo.stfc.ac.uk:1094/lhcb:prod/lhcb/LHCb/Collision16/FULLTURBO.DST/00052099/00

00/00052099_00001653_2.fullturbo.dst] Got a timer event

The file reads are in general not random I suppose - in normal operations we would read from the beginning to the end of the file, looping over the events and my test jobs do exactly that.

I am not sure that I understand why the streaming should essentially do a copy locally of the whole file at one go. Doesn't this negate the whole point of the streaming? Apologies if this has been discussed before, but I am not completely aware of the detailed rationale behind the choices of the system, except in a general sense.

It would be ideal if this problem can be made to "go away" by fine tuning the caching layer. For this, it would be nice to have a feel for what variables will affect the caching layer - it would be great if you could point us in that direction. Of course we will be happy to consider other solutions if that is the best way forward. It is clear however that the current situation cannot be the status quo.

Many thanks and Cheers,
Raja.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://github.com/xrootd/xrootd/issues/1259#issuecomment-662718595>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABHVGA35STS7FWPION5YVUDR45OKLANCNFSM4PEQBI3Q>.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/xrootd/xrootd/issues/1259#issuecomment-662727606
########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1