Hi, Unfortunately it seems to occur also for release 5 (5.0.0-1). I tried with this container: https://gitlab.physik.uni-muenchen.de/Nikolai.Hartmann/xcache-singularity-lrz/-/blob/v5/xcache.singularity.def using basically the same configuration as before, except that i commented the `namelib` directive since the rucioN2N plugin rpm i was using required xrootd < 5.0.0-1. In case it helps, here the new image: https://cloud.physik.lmu.de/index.php/s/4AcEcFRPwXqZb9b Thanks, Nikolai On 7/9/20 1:31 AM, Matevz Tadel wrote: > On 2020-07-08 15:00, Andrew Hanushevsky wrote: >> Could you try git head? There were issues in RC4. There might still be >> issues, but I never saw the one you and Nikolai tripped over. > > Yes, I managed to reproduce it on rc4 tag with just basic gsi enabled > ... on mater / 5.0.0 it doesn't happen. > > Nikolai, please update to 5.0.0. > > Matevz > >> Andy >> >> >> On Wed, 8 Jul 2020, Matevz Tadel wrote: >> >>> On 2020-07-08 14:19, Andrew Hanushevsky wrote: >>>> What release? Git head? >>> >>> This was from Nikolai's image, 5-rc4. >>> >>> \m >>> >>>> On Wed, 8 Jul 2020, Matevz Tadel wrote: >>>> >>>>> Hi Andy, >>>>> >>>>> On 2020-07-08 13:49, Andrew Hanushevsky wrote: >>>>>> Hi Matevz, >>>>>> >>>>>> Well, what kind of authentication? Clearly, the kind we use >>>>>> doesn't cause this >>>>>> problem. It could be just a random core smash but if it's random >>>>>> we should be >>>>>> various effects not just a crash in this particular code path, right? >>>>> >>>>> xcache without any security config, everything works smooth. >>>>> >>>>> xcache with sec.protocol /usr/lib64 gsi --- trouble: >>>>> >>>>> 200708 13:38:44 240995 XrootdXeq: matevz.241046:31@uaf-7 pub IPv4 >>>>> login as d0ba0e6c.0 >>>>> 200708 13:38:44 240995 Posix_P2L: file >>>>> /eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root >>>>> pfn2lfn >>>>> /eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root >>>>> >>>>> [2020-07-08 13:38:44.739012 -0700][Error ][AsyncSock ] >>>>> [[log in to unmask]:1094.0] Unable to connect: network is >>>>> unreachable >>>>> [2020-07-08 13:38:44.739092 -0700][Error ][PostMaster ] >>>>> [[log in to unmask]:1094] elapsed = 0, pConnectionWindow = 120 >>>>> seconds. >>>>> [2020-07-08 13:38:45.637583 -0700][Error ][XRootDTransport ] >>>>> [[log in to unmask]:1094.0] Authentication with gsi failed: >>>>> [2020-07-08 13:38:45.974332 -0700][Error ][AsyncSock ] >>>>> [[log in to unmask]:1095.0] Unable to connect: network is >>>>> unreachable >>>>> [2020-07-08 13:38:45.974400 -0700][Error ][PostMaster ] >>>>> [[log in to unmask]:1095] elapsed = 0, pConnectionWindow = >>>>> 120 seconds. >>>>> 200708 13:38:46 240995 XrdPfc_Manager: info Cache::Attach() >>>>> root:[log in to unmask] >>>>> 200708 13:38:46 240995 XrdPfc_Manager: debug Cache::GetFile >>>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root, >>>>> io 0xe30f50 >>>>> 200708 13:38:46 240995 XrdPfc_IO: debug >>>>> IOEntireFile::initCachedStat get stat from client res = 0, size = >>>>> 2272072 >>>>> root:[log in to unmask] >>>>> 200708 13:38:46 240995 XrdPfc_File: debug Creating new file info, >>>>> data size = 2272072 num blocks = 3 >>>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root >>>>> 200708 13:38:46 240995 XrdPfc_Manager: debug Cache::inc_ref_cnt >>>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root, >>>>> cnt at exit = 1 >>>>> 200708 13:38:46 240995 XrdPfc_File: debug File::AddIO() io = >>>>> 0xe30f50 >>>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root >>>>> 200708 13:38:46 240995 XrdPfc_Manager: debug Cache::Attach() >>>>> root:[log in to unmask] >>>>> location: [log in to unmask]:1095 >>>>> [2020-07-08 13:38:47.022428 -0700][Error ][AsyncSock ] >>>>> [[log in to unmask]:1095.0] Socket error encountered: >>>>> [ERROR] Invalid arguments >>>>> [2020-07-08 13:38:47.022506 -0700][Error ][XRootD ] >>>>> [[log in to unmask]:1095] Unable to get the response to >>>>> request kXR_read (handle: 0x00000000, offset: 0, size: 1048576) >>>>> [2020-07-08 13:38:47.022625 -0700][Error ][File ] >>>>> [0xf0b040@root:[log in to unmask]:1094//eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root?xrdcl.requuid=6730e10b-8b40-43bf-9d0a-75da982939e8] >>>>> Fatal file state error. Message kXR_read (handle: 0x00000000, >>>>> offset: 0, size: 1048576) returned with [ERROR] Invalid arguments >>>>> 200708 13:38:47 241052 XrdPfc_File: error >>>>> File::ProcessBlockResponse block 0xff3440 0 error=-22 >>>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root >>>>> 200708 13:38:47 240995 XrdPfc_File: error File::Read() io 0xe30f50, >>>>> block 0 finished with error 22 invalid argument >>>>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root >>>>> src/tcmalloc.cc:284] Attempt to free invalid pointer >>>>> 0x313262003543620a >>>>> >>>>> Note that while we get all these network errors at the start, cache >>>>> still got the stat info from the server (knows the size of the file). >>>>> >>>>> I must admit I never test xcache with auth on :( I'll try it out >>>>> now, well, after lunch :) >>>>> >>>>> Matevz >>>>> >>>>>> Andy >>>>>> >>>>>> >>>>>> On Wed, 8 Jul 2020, Matevz Tadel wrote: >>>>>> >>>>>>> Yay, that was a journey ... but I can reproduce it now! >>>>>>> >>>>>>> It is super strange this happens with xcache with authentication >>>>>>> on only ... >>>>>>> this really should have no effect. I first tried without it and >>>>>>> it worked and >>>>>>> then something rang a bell that you said so in the email :). >>>>>>> >>>>>>> Andy, does this ring any bells for you? It looks like interaction >>>>>>> between >>>>>>> server / client usage of X509 stuffe. >>>>>>> >>>>>>> Anyway, I'm digging on on the xcache side ... >>>>>>> >>>>>>> Cheers, >>>>>>> Matevz >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 2020-07-08 07:41, Nikolai Hartmann wrote: >>>>>>>> Hi Matevz, >>>>>>>> >>>>>>>> I might have something like a "minimal failing example". >>>>>>>> Unfortunately >>>>>>>> the problem only appears when authentication is required, so the >>>>>>>> example >>>>>>>> will only work on a machine that has a valid host certificate >>>>>>>> and the >>>>>>>> corresponding directory has to be bind-mounted into the container. >>>>>>>> >>>>>>>> I uploaded my container image here: >>>>>>>> >>>>>>>> https://urldefense.com/v3/__https://cloud.physik.lmu.de/index.php/s/RFC6Q89FBxxNMXF__;!!Mih3wA!S4S4O0y7f1Z5oNAgkr2EZ2J5683bZ5LRbG55GbcoHhyJTwOzaS2lABcIifddJxDGMy-N$ >>>>>>>> >>>>>>>> >>>>>>>> and made a directory structure (tar archive attached) to bind >>>>>>>> mount into >>>>>>>> the container (and containing the minimal failing xcache config >>>>>>>> and a >>>>>>>> script for starting gdb inside the container) >>>>>>>> >>>>>>>> To reproduce, extract the archive, enter the directory and run (as >>>>>>>> non-root user) >>>>>>>> >>>>>>>> singularity run -B $(pwd)/data:/data -B >>>>>>>> $(pwd)/config:/etc/xrootd:ro -B >>>>>>>> <hostkey-dir>:/etc/grid-security:ro <singularity-image> >>>>>>>> >>>>>>>> where <hostkey-dir> is a directory that contains >>>>>>>> >>>>>>>> hostkey.pem >>>>>>>> hostcert.pem >>>>>>>> vomsdir (will become X509_VOMS_DIR) >>>>>>>> certificates (will become X509_CERT_DIR) >>>>>>>> >>>>>>>> and <singularity-image> is the path to the singularity image. >>>>>>>> >>>>>>>> That should run xrootd and the log should appear in >>>>>>>> data/xrd/var/log/xrootd.log >>>>>>>> >>>>>>>> I used this example to produce the failure: >>>>>>>> >>>>>>>> xrdcp -f >>>>>>>> root://lcg-lrz-xcache0.grid.lrz.de:1094//root://eospublic.cern.ch//eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root >>>>>>>> >>>>>>>> /dev/null >>>>>>>> >>>>>>>> The simplest way to run gdb seemed to directly start xrootd with >>>>>>>> gdb. >>>>>>>> This can be done with the script run_xcache_debug.sh in the >>>>>>>> attached >>>>>>>> archive. Instead of the command above just use >>>>>>>> >>>>>>>> singularity exec -B $(pwd)/data:/data -B >>>>>>>> $(pwd)/config:/etc/xrootd:ro -B >>>>>>>> <hostkey-dir>:/etc/grid-security:ro <singularity-image> >>>>>>>> ./run_xcache_debug.sh >>>>>>>> >>>>>>>> Note: Before restarting, best delete the content of the data >>>>>>>> directory >>>>>>>> since the bug also did not seem to occur when the file was already >>>>>>>> cached (e.g after testing without authentication) >>>>>>>> >>>>>>>> Sorry for the overly complicated reproducing steps, but since it >>>>>>>> only >>>>>>>> happened when i authentication was enabled i didn't know how to >>>>>>>> do it >>>>>>>> simpler. I hope it helps. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Nikolai >>>>>>>> >>>>>>>> On 7/7/20 8:42 PM, Matevz Tadel wrote: >>>>>>>>> Thanks Nikolai, I shall continue my investigation :) >>>>>>>>> >>>>>>>>> Matevz >>>>>>>>> >>>>>>>>> On 2020-07-06 23:59, Nikolai Hartmann wrote: >>>>>>>>>> Hi Matevz, >>>>>>>>>> >>>>>>>>>> Thanks a lot for looking into this. >>>>>>>>>> >>>>>>>>>> - The crash seems to happen always when i make a request >>>>>>>>>> - Currently prefetching is disabled >>>>>>>>>> - Yes, i think it is direct proxy mode >>>>>>>>>> - stack trace is attached >>>>>>>>>> >>>>>>>>>> A similar setup seems to work for Ilija without issues with >>>>>>>>>> the xcaches >>>>>>>>>> using slate - i tried to mimic that setup closely. Running >>>>>>>>>> xrootd from >>>>>>>>>> this container image: >>>>>>>>>> >>>>>>>>>> https://urldefense.com/v3/__https://gitlab.physik.uni-muenchen.de/Nikolai.Hartmann/xcache-singularity-lrz/-/blob/51d2da52829eb6d8ea377539884f337208141aca/xcache.singularity.def__;!!Mih3wA!SJibOzmy2P3rdD8Ut7m7gYp_bah2pQX2dR2V9U6xiTq9PoQtfjb_MHHDljpOV0aWvVYj$ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> using this config >>>>>>>>>> >>>>>>>>>> https://urldefense.com/v3/__https://gitlab.physik.uni-muenchen.de/Nikolai.Hartmann/xcache-singularity-lrz/-/blob/51d2da52829eb6d8ea377539884f337208141aca/etc/xrootd/xcache.cfg__;!!Mih3wA!SJibOzmy2P3rdD8Ut7m7gYp_bah2pQX2dR2V9U6xiTq9PoQtfjb_MHHDljpOVzHQF5CU$ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Nikolai >>>>>>>>>> >>>>>>>>>> On 7/7/20 1:38 AM, Matevz Tadel wrote: >>>>>>>>>>> Hi Nikolai, >>>>>>>>>>> >>>>>>>>>>> I tried to reproduce it with current master in nearly all ways, >>>>>>>>>>> with/without prefetching and with direct/forwarding mode. >>>>>>>>>>> Also, with std >>>>>>>>>>> malloc and tcmalloc. No luck :( >>>>>>>>>>> >>>>>>>>>>> Backtrace or core would help a lot at this point. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Matevz >>>>>>>>>>> >>>>>>>>>>> On 2020-07-03 00:54, Nikolai Hartmann wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I'm trying to upgrade to xrootd5 rc4 for our xcache server to >>>>>>>>>>>> mitigate a >>>>>>>>>>>> problem with dCache. >>>>>>>>>>>> >>>>>>>>>>>> Now when i try to read a file through xcache it crashes with >>>>>>>>>>>> "Attempt to >>>>>>>>>>>> free invalid pointer". I attached the corresponding part of >>>>>>>>>>>> the log. >>>>>>>>>>>> Any ideas? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Nikolai >>>>>>>>>>>> >>>>>>>>>>>> ######################################################################## >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Use REPLY-ALL to reply to list >>>>>>>>>>>> >>>>>>>>>>>> To unsubscribe from the XROOTD-L list, click the following >>>>>>>>>>>> link: >>>>>>>>>>>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!Xzk53aW-mEg2pavzme9Hd49MPmno8frpbkh2YetRsquNyAt5jiVsDB91pTNUHA$ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> ######################################################################## >>>>>>> >>>>>>> Use REPLY-ALL to reply to list >>>>>>> >>>>>>> To unsubscribe from the XROOTD-L list, click the following link: >>>>>>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!XFGFw19U2eq-rA0gnMt46KV3Nmc-QzeRzIK6fXXO8cvBFHyGZUlCmc9OXbuf2OOR3Nx7$ >>>>>>> >>>>> >>>>> ######################################################################## >>>>> >>>>> Use REPLY-ALL to reply to list >>>>> >>>>> To unsubscribe from the XROOTD-L list, click the following link: >>>>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!UUgXa89ev5J0vPpdGvvHbICVnr6QNXvQ2IcZ9n1-1EWyCIX2l3I_RnFsdnvmvvfjFolt$ >>>> >>>> >>>> >>>> ######################################################################## >>>> >>>> Use REPLY-ALL to reply to list >>>> >>>> To unsubscribe from the XROOTD-L list, click the following link: >>>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!UUgXa89ev5J0vPpdGvvHbICVnr6QNXvQ2IcZ9n1-1EWyCIX2l3I_RnFsdnvmvvfjFolt$ >>> >>> >>> > > ######################################################################## > Use REPLY-ALL to reply to list > > To unsubscribe from the XROOTD-L list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1