Could you try git head? There were issues in RC4. There might still be issues, but I never saw the one you and Nikolai tripped over. Andy On Wed, 8 Jul 2020, Matevz Tadel wrote: > On 2020-07-08 14:19, Andrew Hanushevsky wrote: >> What release? Git head? > > This was from Nikolai's image, 5-rc4. > > \m > >> On Wed, 8 Jul 2020, Matevz Tadel wrote: >> >>> Hi Andy, >>> >>> On 2020-07-08 13:49, Andrew Hanushevsky wrote: >>>> Hi Matevz, >>>> >>>> Well, what kind of authentication? Clearly, the kind we use doesn't cause >>>> this >>>> problem. It could be just a random core smash but if it's random we >>>> should be >>>> various effects not just a crash in this particular code path, right? >>> >>> xcache without any security config, everything works smooth. >>> >>> xcache with sec.protocol /usr/lib64 gsi --- trouble: >>> >>> 200708 13:38:44 240995 XrootdXeq: matevz.241046:31@uaf-7 pub IPv4 login as >>> d0ba0e6c.0 >>> 200708 13:38:44 240995 Posix_P2L: file >>> /eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root >>> pfn2lfn >>> /eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root >>> [2020-07-08 13:38:44.739012 -0700][Error ][AsyncSock ] >>> [[log in to unmask]:1094.0] Unable to connect: network is unreachable >>> [2020-07-08 13:38:44.739092 -0700][Error ][PostMaster ] >>> [[log in to unmask]:1094] elapsed = 0, pConnectionWindow = 120 seconds. >>> [2020-07-08 13:38:45.637583 -0700][Error ][XRootDTransport ] >>> [[log in to unmask]:1094.0] Authentication with gsi failed: >>> [2020-07-08 13:38:45.974332 -0700][Error ][AsyncSock ] >>> [[log in to unmask]:1095.0] Unable to connect: network is >>> unreachable >>> [2020-07-08 13:38:45.974400 -0700][Error ][PostMaster ] >>> [[log in to unmask]:1095] elapsed = 0, pConnectionWindow = 120 >>> seconds. >>> 200708 13:38:46 240995 XrdPfc_Manager: info Cache::Attach() >>> root:[log in to unmask] >>> 200708 13:38:46 240995 XrdPfc_Manager: debug Cache::GetFile >>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root, io >>> 0xe30f50 >>> 200708 13:38:46 240995 XrdPfc_IO: debug IOEntireFile::initCachedStat get >>> stat from client res = 0, size = 2272072 >>> root:[log in to unmask] >>> 200708 13:38:46 240995 XrdPfc_File: debug Creating new file info, data >>> size = 2272072 num blocks = 3 >>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root >>> 200708 13:38:46 240995 XrdPfc_Manager: debug Cache::inc_ref_cnt >>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root, >>> cnt at exit = 1 >>> 200708 13:38:46 240995 XrdPfc_File: debug File::AddIO() io = 0xe30f50 >>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root >>> 200708 13:38:46 240995 XrdPfc_Manager: debug Cache::Attach() >>> root:[log in to unmask] >>> location: [log in to unmask]:1095 >>> [2020-07-08 13:38:47.022428 -0700][Error ][AsyncSock ] >>> [[log in to unmask]:1095.0] Socket error encountered: [ERROR] >>> Invalid arguments >>> [2020-07-08 13:38:47.022506 -0700][Error ][XRootD ] >>> [[log in to unmask]:1095] Unable to get the response to request >>> kXR_read (handle: 0x00000000, offset: 0, size: 1048576) >>> [2020-07-08 13:38:47.022625 -0700][Error ][File ] >>> [0xf0b040@root:[log in to unmask]:1094//eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root?xrdcl.requuid=6730e10b-8b40-43bf-9d0a-75da982939e8] >>> Fatal file state error. Message kXR_read (handle: 0x00000000, offset: 0, >>> size: 1048576) returned with [ERROR] Invalid arguments >>> 200708 13:38:47 241052 XrdPfc_File: error File::ProcessBlockResponse block >>> 0xff3440 0 error=-22 >>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root >>> 200708 13:38:47 240995 XrdPfc_File: error File::Read() io 0xe30f50, block >>> 0 finished with error 22 invalid argument >>> eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root >>> src/tcmalloc.cc:284] Attempt to free invalid pointer 0x313262003543620a >>> >>> Note that while we get all these network errors at the start, cache still >>> got the stat info from the server (knows the size of the file). >>> >>> I must admit I never test xcache with auth on :( I'll try it out now, >>> well, after lunch :) >>> >>> Matevz >>> >>>> Andy >>>> >>>> >>>> On Wed, 8 Jul 2020, Matevz Tadel wrote: >>>> >>>>> Yay, that was a journey ... but I can reproduce it now! >>>>> >>>>> It is super strange this happens with xcache with authentication on only >>>>> ... >>>>> this really should have no effect. I first tried without it and it >>>>> worked and >>>>> then something rang a bell that you said so in the email :). >>>>> >>>>> Andy, does this ring any bells for you? It looks like interaction >>>>> between >>>>> server / client usage of X509 stuffe. >>>>> >>>>> Anyway, I'm digging on on the xcache side ... >>>>> >>>>> Cheers, >>>>> Matevz >>>>> >>>>> >>>>> >>>>> On 2020-07-08 07:41, Nikolai Hartmann wrote: >>>>>> Hi Matevz, >>>>>> >>>>>> I might have something like a "minimal failing example". Unfortunately >>>>>> the problem only appears when authentication is required, so the >>>>>> example >>>>>> will only work on a machine that has a valid host certificate and the >>>>>> corresponding directory has to be bind-mounted into the container. >>>>>> >>>>>> I uploaded my container image here: >>>>>> >>>>>> https://urldefense.com/v3/__https://cloud.physik.lmu.de/index.php/s/RFC6Q89FBxxNMXF__;!!Mih3wA!S4S4O0y7f1Z5oNAgkr2EZ2J5683bZ5LRbG55GbcoHhyJTwOzaS2lABcIifddJxDGMy-N$ >>>>>> >>>>>> >>>>>> and made a directory structure (tar archive attached) to bind mount >>>>>> into >>>>>> the container (and containing the minimal failing xcache config and a >>>>>> script for starting gdb inside the container) >>>>>> >>>>>> To reproduce, extract the archive, enter the directory and run (as >>>>>> non-root user) >>>>>> >>>>>> singularity run -B $(pwd)/data:/data -B $(pwd)/config:/etc/xrootd:ro -B >>>>>> <hostkey-dir>:/etc/grid-security:ro <singularity-image> >>>>>> >>>>>> where <hostkey-dir> is a directory that contains >>>>>> >>>>>> hostkey.pem >>>>>> hostcert.pem >>>>>> vomsdir (will become X509_VOMS_DIR) >>>>>> certificates (will become X509_CERT_DIR) >>>>>> >>>>>> and <singularity-image> is the path to the singularity image. >>>>>> >>>>>> That should run xrootd and the log should appear in >>>>>> data/xrd/var/log/xrootd.log >>>>>> >>>>>> I used this example to produce the failure: >>>>>> >>>>>> xrdcp -f >>>>>> root://lcg-lrz-xcache0.grid.lrz.de:1094//root://eospublic.cern.ch//eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root >>>>>> >>>>>> /dev/null >>>>>> >>>>>> The simplest way to run gdb seemed to directly start xrootd with gdb. >>>>>> This can be done with the script run_xcache_debug.sh in the attached >>>>>> archive. Instead of the command above just use >>>>>> >>>>>> singularity exec -B $(pwd)/data:/data -B $(pwd)/config:/etc/xrootd:ro >>>>>> -B >>>>>> <hostkey-dir>:/etc/grid-security:ro <singularity-image> >>>>>> ./run_xcache_debug.sh >>>>>> >>>>>> Note: Before restarting, best delete the content of the data directory >>>>>> since the bug also did not seem to occur when the file was already >>>>>> cached (e.g after testing without authentication) >>>>>> >>>>>> Sorry for the overly complicated reproducing steps, but since it only >>>>>> happened when i authentication was enabled i didn't know how to do it >>>>>> simpler. I hope it helps. >>>>>> >>>>>> Thanks, >>>>>> Nikolai >>>>>> >>>>>> On 7/7/20 8:42 PM, Matevz Tadel wrote: >>>>>>> Thanks Nikolai, I shall continue my investigation :) >>>>>>> >>>>>>> Matevz >>>>>>> >>>>>>> On 2020-07-06 23:59, Nikolai Hartmann wrote: >>>>>>>> Hi Matevz, >>>>>>>> >>>>>>>> Thanks a lot for looking into this. >>>>>>>> >>>>>>>> - The crash seems to happen always when i make a request >>>>>>>> - Currently prefetching is disabled >>>>>>>> - Yes, i think it is direct proxy mode >>>>>>>> - stack trace is attached >>>>>>>> >>>>>>>> A similar setup seems to work for Ilija without issues with the >>>>>>>> xcaches >>>>>>>> using slate - i tried to mimic that setup closely. Running xrootd >>>>>>>> from >>>>>>>> this container image: >>>>>>>> >>>>>>>> https://urldefense.com/v3/__https://gitlab.physik.uni-muenchen.de/Nikolai.Hartmann/xcache-singularity-lrz/-/blob/51d2da52829eb6d8ea377539884f337208141aca/xcache.singularity.def__;!!Mih3wA!SJibOzmy2P3rdD8Ut7m7gYp_bah2pQX2dR2V9U6xiTq9PoQtfjb_MHHDljpOV0aWvVYj$ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> using this config >>>>>>>> >>>>>>>> https://urldefense.com/v3/__https://gitlab.physik.uni-muenchen.de/Nikolai.Hartmann/xcache-singularity-lrz/-/blob/51d2da52829eb6d8ea377539884f337208141aca/etc/xrootd/xcache.cfg__;!!Mih3wA!SJibOzmy2P3rdD8Ut7m7gYp_bah2pQX2dR2V9U6xiTq9PoQtfjb_MHHDljpOVzHQF5CU$ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Nikolai >>>>>>>> >>>>>>>> On 7/7/20 1:38 AM, Matevz Tadel wrote: >>>>>>>>> Hi Nikolai, >>>>>>>>> >>>>>>>>> I tried to reproduce it with current master in nearly all ways, >>>>>>>>> with/without prefetching and with direct/forwarding mode. Also, with >>>>>>>>> std >>>>>>>>> malloc and tcmalloc. No luck :( >>>>>>>>> >>>>>>>>> Backtrace or core would help a lot at this point. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Matevz >>>>>>>>> >>>>>>>>> On 2020-07-03 00:54, Nikolai Hartmann wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I'm trying to upgrade to xrootd5 rc4 for our xcache server to >>>>>>>>>> mitigate a >>>>>>>>>> problem with dCache. >>>>>>>>>> >>>>>>>>>> Now when i try to read a file through xcache it crashes with >>>>>>>>>> "Attempt to >>>>>>>>>> free invalid pointer". I attached the corresponding part of the >>>>>>>>>> log. >>>>>>>>>> Any ideas? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Nikolai >>>>>>>>>> >>>>>>>>>> ######################################################################## >>>>>>>>>> >>>>>>>>>> Use REPLY-ALL to reply to list >>>>>>>>>> >>>>>>>>>> To unsubscribe from the XROOTD-L list, click the following link: >>>>>>>>>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!Xzk53aW-mEg2pavzme9Hd49MPmno8frpbkh2YetRsquNyAt5jiVsDB91pTNUHA$ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>>> ######################################################################## >>>>> Use REPLY-ALL to reply to list >>>>> >>>>> To unsubscribe from the XROOTD-L list, click the following link: >>>>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!XFGFw19U2eq-rA0gnMt46KV3Nmc-QzeRzIK6fXXO8cvBFHyGZUlCmc9OXbuf2OOR3Nx7$ >>>>> >>> >>> ######################################################################## >>> Use REPLY-ALL to reply to list >>> >>> To unsubscribe from the XROOTD-L list, click the following link: >>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!UUgXa89ev5J0vPpdGvvHbICVnr6QNXvQ2IcZ9n1-1EWyCIX2l3I_RnFsdnvmvvfjFolt$ >> >> ######################################################################## >> Use REPLY-ALL to reply to list >> >> To unsubscribe from the XROOTD-L list, click the following link: >> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!UUgXa89ev5J0vPpdGvvHbICVnr6QNXvQ2IcZ9n1-1EWyCIX2l3I_RnFsdnvmvvfjFolt$ > ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1