Hi Matevz, Well, what kind of authentication? Clearly, the kind we use doesn't cause this problem. It could be just a random core smash but if it's random we should be various effects not just a crash in this particular code path, right? Andy On Wed, 8 Jul 2020, Matevz Tadel wrote: > Yay, that was a journey ... but I can reproduce it now! > > It is super strange this happens with xcache with authentication on only ... > this really should have no effect. I first tried without it and it worked and > then something rang a bell that you said so in the email :). > > Andy, does this ring any bells for you? It looks like interaction between > server / client usage of X509 stuffe. > > Anyway, I'm digging on on the xcache side ... > > Cheers, > Matevz > > > > On 2020-07-08 07:41, Nikolai Hartmann wrote: >> Hi Matevz, >> >> I might have something like a "minimal failing example". Unfortunately >> the problem only appears when authentication is required, so the example >> will only work on a machine that has a valid host certificate and the >> corresponding directory has to be bind-mounted into the container. >> >> I uploaded my container image here: >> >> https://urldefense.com/v3/__https://cloud.physik.lmu.de/index.php/s/RFC6Q89FBxxNMXF__;!!Mih3wA!S4S4O0y7f1Z5oNAgkr2EZ2J5683bZ5LRbG55GbcoHhyJTwOzaS2lABcIifddJxDGMy-N$ >> >> and made a directory structure (tar archive attached) to bind mount into >> the container (and containing the minimal failing xcache config and a >> script for starting gdb inside the container) >> >> To reproduce, extract the archive, enter the directory and run (as >> non-root user) >> >> singularity run -B $(pwd)/data:/data -B $(pwd)/config:/etc/xrootd:ro -B >> <hostkey-dir>:/etc/grid-security:ro <singularity-image> >> >> where <hostkey-dir> is a directory that contains >> >> hostkey.pem >> hostcert.pem >> vomsdir (will become X509_VOMS_DIR) >> certificates (will become X509_CERT_DIR) >> >> and <singularity-image> is the path to the singularity image. >> >> That should run xrootd and the log should appear in >> data/xrd/var/log/xrootd.log >> >> I used this example to produce the failure: >> >> xrdcp -f >> root://lcg-lrz-xcache0.grid.lrz.de:1094//root://eospublic.cern.ch//eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root >> /dev/null >> >> The simplest way to run gdb seemed to directly start xrootd with gdb. >> This can be done with the script run_xcache_debug.sh in the attached >> archive. Instead of the command above just use >> >> singularity exec -B $(pwd)/data:/data -B $(pwd)/config:/etc/xrootd:ro -B >> <hostkey-dir>:/etc/grid-security:ro <singularity-image> >> ./run_xcache_debug.sh >> >> Note: Before restarting, best delete the content of the data directory >> since the bug also did not seem to occur when the file was already >> cached (e.g after testing without authentication) >> >> Sorry for the overly complicated reproducing steps, but since it only >> happened when i authentication was enabled i didn't know how to do it >> simpler. I hope it helps. >> >> Thanks, >> Nikolai >> >> On 7/7/20 8:42 PM, Matevz Tadel wrote: >>> Thanks Nikolai, I shall continue my investigation :) >>> >>> Matevz >>> >>> On 2020-07-06 23:59, Nikolai Hartmann wrote: >>>> Hi Matevz, >>>> >>>> Thanks a lot for looking into this. >>>> >>>> - The crash seems to happen always when i make a request >>>> - Currently prefetching is disabled >>>> - Yes, i think it is direct proxy mode >>>> - stack trace is attached >>>> >>>> A similar setup seems to work for Ilija without issues with the xcaches >>>> using slate - i tried to mimic that setup closely. Running xrootd from >>>> this container image: >>>> >>>> https://urldefense.com/v3/__https://gitlab.physik.uni-muenchen.de/Nikolai.Hartmann/xcache-singularity-lrz/-/blob/51d2da52829eb6d8ea377539884f337208141aca/xcache.singularity.def__;!!Mih3wA!SJibOzmy2P3rdD8Ut7m7gYp_bah2pQX2dR2V9U6xiTq9PoQtfjb_MHHDljpOV0aWvVYj$ >>>> >>>> >>>> using this config >>>> >>>> https://urldefense.com/v3/__https://gitlab.physik.uni-muenchen.de/Nikolai.Hartmann/xcache-singularity-lrz/-/blob/51d2da52829eb6d8ea377539884f337208141aca/etc/xrootd/xcache.cfg__;!!Mih3wA!SJibOzmy2P3rdD8Ut7m7gYp_bah2pQX2dR2V9U6xiTq9PoQtfjb_MHHDljpOVzHQF5CU$ >>>> >>>> >>>> Cheers, >>>> Nikolai >>>> >>>> On 7/7/20 1:38 AM, Matevz Tadel wrote: >>>>> Hi Nikolai, >>>>> >>>>> I tried to reproduce it with current master in nearly all ways, >>>>> with/without prefetching and with direct/forwarding mode. Also, with std >>>>> malloc and tcmalloc. No luck :( >>>>> >>>>> Backtrace or core would help a lot at this point. >>>>> >>>>> Cheers, >>>>> Matevz >>>>> >>>>> On 2020-07-03 00:54, Nikolai Hartmann wrote: >>>>>> Hi, >>>>>> >>>>>> I'm trying to upgrade to xrootd5 rc4 for our xcache server to >>>>>> mitigate a >>>>>> problem with dCache. >>>>>> >>>>>> Now when i try to read a file through xcache it crashes with >>>>>> "Attempt to >>>>>> free invalid pointer". I attached the corresponding part of the log. >>>>>> Any ideas? >>>>>> >>>>>> Thanks, >>>>>> Nikolai >>>>>> >>>>>> ######################################################################## >>>>>> >>>>>> Use REPLY-ALL to reply to list >>>>>> >>>>>> To unsubscribe from the XROOTD-L list, click the following link: >>>>>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!Xzk53aW-mEg2pavzme9Hd49MPmno8frpbkh2YetRsquNyAt5jiVsDB91pTNUHA$ >>>>>> >>>>>> >>>>>> >>>>> >>> > > ######################################################################## > Use REPLY-ALL to reply to list > > To unsubscribe from the XROOTD-L list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 > ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1