When using XRootD 4.12.6-1 (osg version) for our XRootD data servers, we notice OOM issues. This doesn't happen when we use 4.11 (we reverted back to that version)

(primeradiant06)
[Mon Mar  8 13:16:45 2021][316012.240505] xrootd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0

[Mon Mar  8 13:16:45 2021][316012.248957] xrootd cpuset=/ mems_allowed=0-1

[Mon Mar  8 13:16:45 2021][316012.253811] CPU: 4 PID: 2890 Comm: xrootd Kdump: loaded Tainted: P           OE  ------------   3.10.0-1160.15.2.el7.x86_64 #1

[Mon Mar  8 13:16:45 2021][316012.266621] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS 2.11.0 11/02/2019
[Mon Mar  8 13:16:45 2021][316012.275358] Call Trace:
[Mon Mar  8 13:16:45 2021][316012.278184]  [<ffffffffac381fba>] dump_stack+0x19/0x1b
[Mon Mar  8 13:16:45 2021][316012.284014]  [<ffffffffac37c8da>] dump_header+0x90/0x229
[Mon Mar  8 13:16:45 2021][316012.290042]  [<ffffffffabd065a2>] ? ktime_get_ts64+0x52/0xf0
[Mon Mar  8 13:16:45 2021][316012.296457]  [<ffffffffabd5dd7f>] ? delayacct_end+0x8f/0xb0
[Mon Mar  8 13:16:45 2021][316012.302773]  [<ffffffffabdc227d>] oom_kill_process+0x2cd/0x490
[Mon Mar  8 13:16:45 2021][316012.309380]  [<ffffffffabdc1c6d>] ? oom_unkillable_task+0xcd/0x120
[Mon Mar  8 13:16:45 2021][316012.316375]  [<ffffffffabdc296a>] out_of_memory+0x31a/0x500
[Mon Mar  8 13:16:45 2021][316012.322691]  [<ffffffffac37d3f7>] __alloc_pages_slowpath+0x5db/0x729
[Mon Mar  8 13:16:45 2021][316012.329878]  [<ffffffffabdc8ee6>] __alloc_pages_nodemask+0x436/0x450
[Mon Mar  8 13:16:45 2021][316012.337066]  [<ffffffffabe18bb8>] alloc_pages_current+0x98/0x110
[Mon Mar  8 13:16:45 2021][316012.343866]  [<ffffffffabdbdd37>] __page_cache_alloc+0x97/0xb0
[Mon Mar  8 13:16:45 2021][316012.350475]  [<ffffffffabdc0cd0>] filemap_fault+0x270/0x420
[Mon Mar  8 13:16:45 2021][316012.356799]  [<ffffffffc05c491e>] __xfs_filemap_fault+0x7e/0x1d0 [xfs]
[Mon Mar  8 13:16:45 2021][316012.364192]  [<ffffffffc05c4b1c>] xfs_filemap_fault+0x2c/0x30 [xfs]
[Mon Mar  8 13:16:45 2021][316012.371288]  [<ffffffffabdedfba>] __do_fault.isra.61+0x8a/0x100
[Mon Mar  8 13:16:45 2021][316012.377991]  [<ffffffffabdee56c>] do_read_fault.isra.63+0x4c/0x1b0
[Mon Mar  8 13:16:45 2021][316012.384985]  [<ffffffffabdf5db0>] handle_mm_fault+0xa20/0xfb0
[Mon Mar  8 13:16:45 2021][316012.391497]  [<ffffffffabe4efde>] ? do_readv_writev+0x19e/0x260
[Mon Mar  8 13:16:45 2021][316012.398191]  [<ffffffffac38f653>] __do_page_fault+0x213/0x500
[Mon Mar  8 13:16:45 2021][316012.404699]  [<ffffffffac38f975>] do_page_fault+0x35/0x90
[Mon Mar  8 13:16:45 2021][316012.410819]  [<ffffffffac38b778>] page_fault+0x28/0x30
[Mon Mar  8 13:16:45 2021][316012.416649] Mem-Info:
[Mon Mar  8 13:16:45 2021][316012.419282] active_anon:15536411 inactive_anon:5562 isolated_anon:0
[Mon Mar  8 13:16:45 2021][316012.419282]  active_file:0 inactive_file:0 isolated_file:0
[Mon Mar  8 13:16:45 2021][316012.419282]  unevictable:0 dirty:0 writeback:0 unstable:0
[Mon Mar  8 13:16:45 2021][316012.419282]  slab_reclaimable:217834 slab_unreclaimable:33046
[Mon Mar  8 13:16:45 2021][316012.419282]  mapped:4831 shmem:6745 pagetables:34002 bounce:0
[Mon Mar  8 13:16:45 2021][316012.419282]  free:41893 free_pcp:0 free_cma:0

The client side messages are the following:

cmd: 08-Mar-2021 13:39:24 EST  Initiating request to open file root://deepthought.crc.nd.edu//store/user/kmohrman/postLHE_step/FullR2Studies/ULChecks/ttXJet-tXq_testFullULWFonCRC_ULCheck_UL17/v5/gen_step_tHq4f_testUpdateGenproddim6TopMay20GST_run2/GEN-00000_242.root
>> cmd: 08-Mar-2021 13:39:29 EST  Successfully opened file root://deepthought.crc.nd.edu//store/user/kmohrman/postLHE_step/FullR2Studies/ULChecks/ttXJet-tXq_testFullULWFonCRC_ULCheck_UL17/v5/gen_step_tHq4f_testUpdateGenproddim6TopMay20GST_run2/GEN-00000_242.root
>> cmd: Begin processing the 1st record. Run 1, Event 1, LumiSection 93 on stream 4 at 08-Mar-2021 13:41:25.583 EST
>> cmd: Begin processing the 2nd record. Run 1, Event 2, LumiSection 93 on stream 3 at 08-Mar-2021 13:41:25.586 EST
>> cmd: Begin processing the 3rd record. Run 1, Event 3, LumiSection 93 on stream 0 at 08-Mar-2021 13:41:25.589 EST
>> cmd: Begin processing the 4th record. Run 1, Event 4, LumiSection 93 on stream 2 at 08-Mar-2021 13:41:25.592 EST
>> cmd: Begin processing the 5th record. Run 1, Event 5, LumiSection 93 on stream 5 at 08-Mar-2021 13:41:25.596 EST
>> cmd: Begin processing the 6th record. Run 1, Event 6, LumiSection 93 on stream 1 at 08-Mar-2021 13:41:25.599 EST
>> cmd: Begin processing the 7th record. Run 1, Event 7, LumiSection 93 on stream 2 at 08-Mar-2021 13:42:14.144 EST
>> cmd: Begin processing the 8th record. Run 1, Event 8, LumiSection 93 on stream 1 at 08-Mar-2021 13:42:18.072 EST
>> cmd: Begin processing the 9th record. Run 1, Event 9, LumiSection 93 on stream 4 at 08-Mar-2021 13:42:26.414 EST
>> cmd: Begin processing the 10th record. Run 1, Event 10, LumiSection 93 on stream 0 at 08-Mar-2021 13:42:48.198 EST
>> cmd: Begin processing the 11th record. Run 1, Event 11, LumiSection 93 on stream 2 at 08-Mar-2021 13:43:03.165 EST
>> cmd: Begin processing the 12th record. Run 1, Event 12, LumiSection 93 on stream 3 at 08-Mar-2021 13:43:05.578 EST
>> cmd: Begin processing the 13th record. Run 1, Event 13, LumiSection 93 on stream 5 at 08-Mar-2021 13:43:07.977 EST
>> cmd: Begin processing the 14th record. Run 1, Event 14, LumiSection 93 on stream 4 at 08-Mar-2021 13:43:16.565 EST
>> cmd: Begin processing the 15th record. Run 1, Event 15, LumiSection 93 on stream 1 at 08-Mar-2021 13:43:19.416 EST
>> cmd: Begin processing the 16th record. Run 1, Event 16, LumiSection 93 on stream 0 at 08-Mar-2021 13:43:43.391 EST
>> cmd: Begin processing the 17th record. Run 1, Event 17, LumiSection 93 on stream 2 at 08-Mar-2021 13:43:44.128 EST
>> cmd: Begin processing the 18th record. Run 1, Event 18, LumiSection 93 on stream 4 at 08-Mar-2021 13:43:53.116 EST
>> cmd: Begin processing the 19th record. Run 1, Event 19, LumiSection 93 on stream 3 at 08-Mar-2021 13:44:02.181 EST
>> cmd: Begin processing the 20th record. Run 1, Event 20, LumiSection 93 on stream 5 at 08-Mar-2021 13:44:20.865 EST
>> cmd: Begin processing the 21st record. Run 1, Event 21, LumiSection 93 on stream 2 at 08-Mar-2021 13:44:30.397 EST
>> cmd: [2021-03-08 13:47:41.409148 -0500][Error  ][XRootD            ] [primeradiant06.crc.nd.edu:1096] Unable to get the response to request kXR_readv (handle: 0x00000000, chunks: 39, total size: 18360925)
>> cmd: [2021-03-08 13:47:41.410099 -0500][Error  ][File              ] [0x91851600@root://deepthought.crc.nd.edu:1094//store/user/kmohrman/postLHE_step/FullR2Studies/ULChecks/ttXJet-tXq_testFullULWFonCRC_ULCheck_UL17/v5/gen_step_tHq4f_testUpdateGenproddim6TopMay20GST_run2/GEN-00000_242.root] Fatal file state error. Message kXR_readv (handle: 0x00000000, chunks: 39, total size: 18360925) returned with [ERROR] Operation expired
>> cmd: %MSG-w XrdAdaptorInternal:  (NoModuleName) 08-Mar-2021 13:47:41 EST pre-events
>> cmd: XrdRequestManager::handle(name='root://deepthought.crc.nd.edu//store/user/kmohrman/postLHE_step/FullR2Studies/ULChecks/ttXJet-tXq_testFullULWFonCRC_ULCheck_UL17/v5/gen_step_tHq4f_testUpdateGenproddim6TopMay20GST_run2/GEN-00000_242.root) failure when reading from primeradiant06.crc.nd.edu:1096 (site T3_US_NotreDame); failed with error '[ERROR] Operation expired' (errno=0, code=206).
>> cmd: %MSG```


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/1430", "url": "https://github.com/xrootd/xrootd/issues/1430", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1