Yeah, that was my first suspicion too. However, OpenSSL man pages swear up and down that an explicit flush isn't needed (and I don't see any buffering). I even tried adding printf statements .. no luck!
However, I was able to make progress by manually adding Connection: close
to the headers, which triggered a new error message. It appears the server is sending too little data and, when keep-alive
is enabled, the client is timing out waiting for a response.
I found that, under load, XrdHttpReq
is send too little data. For example, given this request:
180830 22:41:28 16434 sysXrdHttp: rc:125 got hdr line: GET //user/ligo//frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf HTTP/1.1
180830 22:41:28 16434 sysXrdHttp: Parsing first line: GET //user/ligo//frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf HTTP/1.1
180830 22:41:28 16434 sysXrdHttp: rc:34 got hdr line: Host: server.example.com:1094
180830 22:41:28 16434 sysXrdHttp: rc:34 got hdr line: Range: bytes=125829120-150994943
180830 22:41:28 16434 sysXrdHttp: rc:13 got hdr line: Accept: */*
180830 22:41:28 16434 sysXrdHttp: rc:24 got hdr line: Connection: Keep-Alive
180830 22:41:28 16434 sysXrdHttp: rc:30 got hdr line: User-Agent: cvmfs Fuse 2.6.0
180830 22:41:28 16434 sysXrdHttp: rc:124 got hdr line: cvmfs-info: Part%20of%20/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 sysXrdHttp: rc:19 got hdr line: Connection: close
Given that byte range, we expect 23MB to be sent. Since Xrootd internally breaks this into 1MB chunks, that means there should be a total of 23 callbacks.
Sure enough, the first few look fine:
180830 22:41:28 16434 XrootdBridge: ligo.85978:126@hcc-briantest7 login as ligo
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process. lp:0x7f1369562028 reqstate: 0
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:0
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_stat: fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 acc_Audit: http grant ligo@[2600:900:6:1301:5054:ff:fe0b:9cba] stat /user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 sysXrdHttp: XrdHttpReq::Data! final=0
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 0
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Stat for GET /user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf stat=0 513145657
24 1520985809
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process. lp:0 reqstate: 0
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:0
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_open: 0-600 fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 acc_Audit: http grant ligo@[2600:900:6:1301:5054:ff:fe0b:9cba] read /user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_fstat: fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 sysXrdHttp: XrdHttpReq::Data! final=1
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 1
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: fhandle:0:0:0:0
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Sending resp: 206 header len:128
180830 22:41:28 16434 sysXrdHttp: Sending 128 bytes
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process. lp:0 reqstate: 1
180830 22:41:28 16434 sysXrdHttp: XrdBridge::SetSF(false) failed.
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:0
However, look at the end:
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_read: 1048576@128974848 fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 sysXrdHttp: XrdHttpReq::Data! final=1
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 5
180830 22:41:28 16434 VOMS failure (5): VOMS extension not found!
180830 22:41:28 16434 sysXrdHttp: Sending 1048576 bytes
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process. lp:0 reqstate: 5
180830 22:41:28 16434 sysXrdHttp: XrdBridge::SetSF(false) failed.
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:0
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_read: 1048576@130023424 fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 sysXrdHttp: XrdHttpReq::Data! final=1
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 6
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Got data vectors to send:1
180830 22:41:28 16434 sysXrdHttp: Sending 1048576 bytes
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process. lp:0 reqstate: 6
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:1
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_close: use=1 fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_close: use=0 fn=dummy
180830 22:41:28 16434 sysXrdHttp: XrdHttpReq::Done
180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 7
Particularly, XrdHttpReq::ProcessHTTPReq
is returning 1
and issuing a ofs_close
command -- after only 5MB of data is sent out of 23MB requested!
The relevant code block is guarded by:
if ( ((reqstate == 3) && (rwOps.size() > 1)) ||
(writtenbytes >= length) ) {
So, writtenbytes >= length
.
For this request, I see no possible way writtenbytes
or length
are incorrectly calculated, nor do I see any way writtenbytes >= length
. A mystery thus far... but hopefully a pretty narrow one!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/xrootd/xrootd","title":"xrootd/xrootd","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/email/message_cards/header.png","avatar_image_url":"https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/xrootd/xrootd"}},"updates":{"snippets":[{"icon":"PERSON","message":"@bbockelm in #813: Yeah, that was my first suspicion too. However, OpenSSL man pages swear up and down that an explicit flush isn't needed (and I don't see any buffering). I even tried adding printf statements .. no luck!\r\n\r\nHowever, I was able to make progress by manually adding `Connection: close` to the headers, which triggered a new error message. It appears the server is sending too little data and, when `keep-alive` is enabled, the client is timing out waiting for a response.\r\n\r\nI found that, under load, `XrdHttpReq` is send too little data. For example, given this request:\r\n```\r\n180830 22:41:28 16434 sysXrdHttp: rc:125 got hdr line: GET //user/ligo//frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf HTTP/1.1\r\n180830 22:41:28 16434 sysXrdHttp: Parsing first line: GET //user/ligo//frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf HTTP/1.1\r\n180830 22:41:28 16434 sysXrdHttp: rc:34 got hdr line: Host: server.example.com:1094\r\n180830 22:41:28 16434 sysXrdHttp: rc:34 got hdr line: Range: bytes=125829120-150994943\r\n180830 22:41:28 16434 sysXrdHttp: rc:13 got hdr line: Accept: */*\r\n180830 22:41:28 16434 sysXrdHttp: rc:24 got hdr line: Connection: Keep-Alive\r\n180830 22:41:28 16434 sysXrdHttp: rc:30 got hdr line: User-Agent: cvmfs Fuse 2.6.0\r\n180830 22:41:28 16434 sysXrdHttp: rc:124 got hdr line: cvmfs-info: Part%20of%20/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 sysXrdHttp: rc:19 got hdr line: Connection: close\r\n```\r\n\r\nGiven that byte range, we expect 23MB to be sent. Since Xrootd internally breaks this into 1MB chunks, that means there should be a total of 23 callbacks.\r\n\r\nSure enough, the first few look fine:\r\n```\r\n180830 22:41:28 16434 XrootdBridge: ligo.85978:126@hcc-briantest7 login as ligo\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process. lp:0x7f1369562028 reqstate: 0\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:0\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_stat: fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 acc_Audit: http grant ligo@[2600:900:6:1301:5054:ff:fe0b:9cba] stat /user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 sysXrdHttp: XrdHttpReq::Data! final=0\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 0\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Stat for GET /user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf stat=0 513145657\r\n 24 1520985809\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process. lp:0 reqstate: 0\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:0\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_open: 0-600 fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 acc_Audit: http grant ligo@[2600:900:6:1301:5054:ff:fe0b:9cba] read /user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_fstat: fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 sysXrdHttp: XrdHttpReq::Data! final=1\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 1\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: fhandle:0:0:0:0\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Sending resp: 206 header len:128\r\n180830 22:41:28 16434 sysXrdHttp: Sending 128 bytes\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process. lp:0 reqstate: 1\r\n180830 22:41:28 16434 sysXrdHttp: XrdBridge::SetSF(false) failed.\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:0\r\n```\r\n\r\nHowever, look at the end:\r\n```\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_read: 1048576@128974848 fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 sysXrdHttp: XrdHttpReq::Data! final=1\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 5\r\n180830 22:41:28 16434 VOMS failure (5): VOMS extension not found!\r\n180830 22:41:28 16434 sysXrdHttp: Sending 1048576 bytes\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process. lp:0 reqstate: 5\r\n180830 22:41:28 16434 sysXrdHttp: XrdBridge::SetSF(false) failed.\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:0\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_read: 1048576@130023424 fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 sysXrdHttp: XrdHttpReq::Data! final=1\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 6\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Got data vectors to send:1\r\n180830 22:41:28 16434 sysXrdHttp: Sending 1048576 bytes\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process. lp:0 reqstate: 6\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: Process is exiting rc:1\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_close: use=1 fn=/user/ligo/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11822/H-H1_CLEANED_HOFT_C02-1182257395-4096.gwf\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 ofs_close: use=0 fn=dummy\r\n180830 22:41:28 16434 sysXrdHttp: XrdHttpReq::Done\r\n180830 22:41:28 16434 ligo.85978:126@hcc-briantest7 sysXrdHttp: PostProcessHTTPReq req: 2 reqstate: 7\r\n```\r\n\r\nParticularly, `XrdHttpReq::ProcessHTTPReq` is returning `1` and issuing a `ofs_close` command -- after only 5MB of data is sent out of 23MB requested!\r\n\r\nThe relevant code block is guarded by:\r\n```\r\n if ( ((reqstate == 3) \u0026\u0026 (rwOps.size() \u003e 1)) ||\r\n (writtenbytes \u003e= length) ) {\r\n```\r\n\r\nSo, `writtenbytes \u003e= length`.\r\n\r\nFor this request, I see no possible way `writtenbytes` or `length` are incorrectly calculated, nor do I see any way `writtenbytes \u003e= length`. A mystery thus far... but hopefully a pretty narrow one!"}],"action":{"name":"View Issue","url":"https://github.com/xrootd/xrootd/issues/813#issuecomment-417655140"}}}
[
{
"@context": "http://schema.org",
"@type": "EmailMessage",
"potentialAction": {
"@type": "ViewAction",
"target": "https://github.com/xrootd/xrootd/issues/813#issuecomment-417655140",
"url": "https://github.com/xrootd/xrootd/issues/813#issuecomment-417655140",
"name": "View Issue"
},
"description": "View this Issue on GitHub",
"publisher": {
"@type": "Organization",
"name": "GitHub",
"url": "https://github.com"
}
},
{
"@type": "MessageCard",
"@context": "http://schema.org/extensions",
"hideOriginalBody": "false",
"originator": "AF6C5A86-E920-430C-9C59-A73278B5EFEB",
"title": "Re: [xrootd/xrootd] XrdHttp loses requests under modest concurrency - v2 (#813)",
"sections": [
{
"text": "",
"activityTitle": "**Brian Bockelman**",
"activityImage": "https://assets-cdn.github.com/images/email/message_cards/avatar.png",
"activitySubtitle": "@bbockelm",
"facts": [
]
}
],
"potentialAction": [
{
"name": "Add a comment",
"@type": "ActionCard",
"inputs": [
{
"isMultiLine": true,
"@type": "TextInput",
"id": "IssueComment",
"isRequired": false
}
],
"actions": [
{
"name": "Comment",
"@type": "HttpPOST",
"target": "https://api.github.com",
"body": "{\n\"commandName\": \"IssueComment\",\n\"repositoryFullName\": \"xrootd/xrootd\",\n\"issueId\": 813,\n\"IssueComment\": \"{{IssueComment.value}}\"\n}"
}
]
},
{
"name": "Close issue",
"@type": "HttpPOST",
"target": "https://api.github.com",
"body": "{\n\"commandName\": \"IssueClose\",\n\"repositoryFullName\": \"xrootd/xrootd\",\n\"issueId\": 813\n}"
},
{
"targets": [
{
"os": "default",
"uri": "https://github.com/xrootd/xrootd/issues/813#issuecomment-417655140"
}
],
"@type": "OpenUri",
"name": "View on GitHub"
},
{
"name": "Unsubscribe",
"@type": "HttpPOST",
"target": "https://api.github.com",
"body": "{\n\"commandName\": \"MuteNotification\",\n\"threadId\": 373849645\n}"
}
],
"themeColor": "26292E"
}
]
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1