Yeah, that was my first suspicion too. However, OpenSSL man pages swear up and down that an explicit flush isn't needed (and I don't see any buffering). I even tried adding printf statements .. no luck!

However, I was able to make progress by manually adding Connection: close to the headers, which triggered a new error message. It appears the server is sending too little data and, when keep-alive is enabled, the client is timing out waiting for a response.

I found that, under load, XrdHttpReq is send too little data. For example, given this request:

180830 22:41:28 16434 sysXrdHttp:  rc:125 got hdr line: GET //user/ligo//frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf HTTP/1.1
180830 22:41:28 16434 sysXrdHttp:  Parsing first line: GET //user/ligo//frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf HTTP/1.1
180830 22:41:28 16434 sysXrdHttp:  rc:34 got hdr line: Host: server.example.com:1094
180830 22:41:28 16434 sysXrdHttp:  rc:34 got hdr line: Range: bytes=125829120-150994943
180830 22:41:28 16434 sysXrdHttp:  rc:13 got hdr line: Accept: */*
180830 22:41:28 16434 sysXrdHttp:  rc:24 got hdr line: Connection: Keep-Alive
180830 22:41:28 16434 sysXrdHttp:  rc:30 got hdr line: User-Agent: cvmfs Fuse 2.6.0
180830 22:41:28 16434 sysXrdHttp:  rc:124 got hdr line: cvmfs-info: Part%20of%20/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 sysXrdHttp:  rc:19 got hdr line: Connection: close

Given that byte range, we expect 23MB to be sent. Since Xrootd internally breaks this into 1MB chunks, that means there should be a total of 23 callbacks.

Sure enough, the first few look fine:

180830 22:41:28 16434 XrootdBridge: ligo.85978:126@hcc-briantest7 login as ligo
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp:  Process. lp:0x7f1369562028 reqstate: 0
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:0
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_stat:  fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 acc_Audit: http grant  ligo@[2600:900:6:1301:5054:ff:fe0b:9cba] stat /user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 sysXrdHttp:  XrdHttpReq::Data! final=0
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 0
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Stat for GET /user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf stat=0 513145657
 24 1520985809
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp:  Process. lp:0 reqstate: 0
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:0
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_open: 0-600 fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 acc_Audit: http grant  ligo@[2600:900:6:1301:5054:ff:fe0b:9cba] read /user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_fstat:  fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 sysXrdHttp:  XrdHttpReq::Data! final=1
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 1
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: fhandle:0:0:0:0
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Sending resp: 206 header len:128
180830 22:41:28 16434 sysXrdHttp: Sending 128 bytes
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp:  Process. lp:0 reqstate: 1
180830 22:41:28 16434 sysXrdHttp:  XrdBridge::SetSF(false) failed.
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:0

However, look at the end:

180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_read: 1048576@128974848 fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 sysXrdHttp:  XrdHttpReq::Data! final=1
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 5
180830 22:41:28 16434 VOMS failure (5): VOMS extension not found!
180830 22:41:28 16434 sysXrdHttp: Sending 1048576 bytes
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp:  Process. lp:0 reqstate: 5
180830 22:41:28 16434 sysXrdHttp:  XrdBridge::SetSF(false) failed.
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:0
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_read: 1048576@130023424 fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 sysXrdHttp:  XrdHttpReq::Data! final=1
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 6
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Got data vectors to send:1
180830 22:41:28 16434 sysXrdHttp: Sending 1048576 bytes
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp:  Process. lp:0 reqstate: 6
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:1
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_close: use=1 fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_close: use=0 fn=dummy
180830 22:41:28 16434 sysXrdHttp:  XrdHttpReq::Done
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 7

Particularly, XrdHttpReq::ProcessHTTPReq is returning 1 and issuing a ofs_close command -- after only 5MB of data is sent out of 23MB requested!

The relevant code block is guarded by:

          if ( ((reqstate == 3) && (rwOps.size() > 1)) ||
            (writtenbytes >= length) ) {

So, writtenbytes >= length.

For this request, I see no possible way writtenbytes or length are incorrectly calculated, nor do I see any way writtenbytes >= length. A mystery thus far... but hopefully a pretty narrow one!


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/xrootd/xrootd","title":"xrootd/xrootd","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/email/message_cards/header.png","avatar_image_url":"https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/xrootd/xrootd"}},"updates":{"snippets":[{"icon":"PERSON","message":"@bbockelm in #813: Yeah, that was my first suspicion too. However, OpenSSL man pages swear up and down that an explicit flush isn't needed (and I don't see any buffering). I even tried adding printf statements .. no luck!\r\n\r\nHowever, I was able to make progress by manually adding `Connection: close` to the headers, which triggered a new error message. It appears the server is sending too little data and, when `keep-alive` is enabled, the client is timing out waiting for a response.\r\n\r\nI found that, under load, `XrdHttpReq` is send too little data. For example, given this request:\r\n```\r\n180830 22:41:28 16434 sysXrdHttp: rc:125 got hdr line: GET //user/ligo//frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf HTTP/1.1\r\n180830 22:41:28 16434 sysXrdHttp: Parsing first line: GET //user/ligo//frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf HTTP/1.1\r\n180830 22:41:28 16434 sysXrdHttp: rc:34 got hdr line: Host: server.example.com:1094\r\n180830 22:41:28 16434 sysXrdHttp: rc:34 got hdr line: Range: bytes=125829120-150994943\r\n180830 22:41:28 16434 sysXrdHttp: rc:13 got hdr line: Accept: */*\r\n180830 22:41:28 16434 sysXrdHttp: rc:24 got hdr line: Connection: Keep-Alive\r\n180830 22:41:28 16434 sysXrdHttp: rc:30 got hdr line: User-Agent: cvmfs Fuse 2.6.0\r\n180830 22:41:28 16434 sysXrdHttp: rc:124 got hdr line: cvmfs-info: Part%20of%20/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 sysXrdHttp: rc:19 got hdr line: Connection: close\r\n```\r\n\r\nGiven that byte range, we expect 23MB to be sent. Since Xrootd internally breaks this into 1MB chunks, that means there should be a total of 23 callbacks.\r\n\r\nSure enough, the first few look fine:\r\n```\r\n180830 22:41:28 16434 XrootdBridge: ligo.85978:126@hcc-briantest7 login as ligo\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process. lp:0x7f1369562028 reqstate: 0\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:0\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_stat: fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 acc_Audit: http grant ligo@[2600:900:6:1301:5054:ff:fe0b:9cba] stat /user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 sysXrdHttp: XrdHttpReq::Data! final=0\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 0\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Stat for GET /user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf stat=0 513145657\r\n 24 1520985809\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process. lp:0 reqstate: 0\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:0\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_open: 0-600 fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 acc_Audit: http grant ligo@[2600:900:6:1301:5054:ff:fe0b:9cba] read /user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_fstat: fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 sysXrdHttp: XrdHttpReq::Data! final=1\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 1\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: fhandle:0:0:0:0\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Sending resp: 206 header len:128\r\n180830 22:41:28 16434 sysXrdHttp: Sending 128 bytes\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process. lp:0 reqstate: 1\r\n180830 22:41:28 16434 sysXrdHttp: XrdBridge::SetSF(false) failed.\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:0\r\n```\r\n\r\nHowever, look at the end:\r\n```\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_read: 1048576@128974848 fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 sysXrdHttp: XrdHttpReq::Data! final=1\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 5\r\n180830 22:41:28 16434 VOMS failure (5): VOMS extension not found!\r\n180830 22:41:28 16434 sysXrdHttp: Sending 1048576 bytes\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process. lp:0 reqstate: 5\r\n180830 22:41:28 16434 sysXrdHttp: XrdBridge::SetSF(false) failed.\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:0\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_read: 1048576@130023424 fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 sysXrdHttp: XrdHttpReq::Data! final=1\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 6\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Got data vectors to send:1\r\n180830 22:41:28 16434 sysXrdHttp: Sending 1048576 bytes\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process. lp:0 reqstate: 6\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:1\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_close: use=1 fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_close: use=0 fn=dummy\r\n180830 22:41:28 16434 sysXrdHttp: XrdHttpReq::Done\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 7\r\n```\r\n\r\nParticularly, `XrdHttpReq::ProcessHTTPReq` is returning `1` and issuing a `ofs_close` command -- after only 5MB of data is sent out of 23MB requested!\r\n\r\nThe relevant code block is guarded by:\r\n```\r\n if ( ((reqstate == 3) \u0026\u0026 (rwOps.size() \u003e 1)) ||\r\n (writtenbytes \u003e= length) ) {\r\n```\r\n\r\nSo, `writtenbytes \u003e= length`.\r\n\r\nFor this request, I see no possible way `writtenbytes` or `length` are incorrectly calculated, nor do I see any way `writtenbytes \u003e= length`. A mystery thus far... but hopefully a pretty narrow one!"}],"action":{"name":"View Issue","url":"https://github.com/xrootd/xrootd/issues/813#issuecomment-417655140"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/813#issuecomment-417655140", "url": "https://github.com/xrootd/xrootd/issues/813#issuecomment-417655140", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } }, { "@type": "MessageCard", "@context": "http://schema.org/extensions", "hideOriginalBody": "false", "originator": "AF6C5A86-E920-430C-9C59-A73278B5EFEB", "title": "Re: [xrootd/xrootd] XrdHttp loses requests under modest concurrency - v2 (#813)", "sections": [ { "text": "", "activityTitle": "**Brian Bockelman**", "activityImage": "https://assets-cdn.github.com/images/email/message_cards/avatar.png", "activitySubtitle": "@bbockelm", "facts": [ ] } ], "potentialAction": [ { "name": "Add a comment", "@type": "ActionCard", "inputs": [ { "isMultiLine": true, "@type": "TextInput", "id": "IssueComment", "isRequired": false } ], "actions": [ { "name": "Comment", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueComment\",\n\"repositoryFullName\": \"xrootd/xrootd\",\n\"issueId\": 813,\n\"IssueComment\": \"{{IssueComment.value}}\"\n}" } ] }, { "name": "Close issue", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueClose\",\n\"repositoryFullName\": \"xrootd/xrootd\",\n\"issueId\": 813\n}" }, { "targets": [ { "os": "default", "uri": "https://github.com/xrootd/xrootd/issues/813#issuecomment-417655140" } ], "@type": "OpenUri", "name": "View on GitHub" }, { "name": "Unsubscribe", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"MuteNotification\",\n\"threadId\": 373849645\n}" } ], "themeColor": "26292E" } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1