When using SSI streams, there are two environment variables which control the length of timeouts, XRD_REQUESTTIMEOUT and XRD_STREAMTIMEOUT. It is not clear how these two timeouts are supposed to interact and in some cases the behaviour is strange.

Example setup: I have a simple XRootD SSI server which does round-robin requests. The client sends a simple payload and a number of records to return, and the server sends back the payload the requested number of times in an SSI stream. For this test, the server adds a 5-second delay to each chunk of the stream, to better observe how the timeouts interact.

Testing with test-client stream 1000, which returns 1000 records in a stream. The requested buffer size is 1024 bytes and the payload is in the order of 100 bytes, so we expect ~10 chunks to be returned on the stream. As we added a 5 sec delay to each chunk, the total processing time is around 50s.

CASE 1. XRD_REQUESTTIMEOUT=1, XRD_STREAMTIMEOUT=2

$ time test/src/test-client stream 1000
180925 11:41:19 22835 ssi_Pb::ServiceClientSide: pid:22835 tid:140317381859456 ServiceClientSide constructor: Request timeout = 1s
180925 11:41:19 22835 ssi_Pb::IStreamBuffer: pid:22835 tid:140317381859456 Called IStreamBuffer() constructor
180925 11:41:19 22835 ssi_Pb::Request: pid:22835 tid:140317381859456 Request() constructor: Response buffer size = 1024 bytes, request timeout = 1s
180925 11:41:19 22843 ssi_Pb::Request: pid:22835 tid:140317244593920 ProcessResponse(): response type = isStream
180925 11:41:19 22843 ssi_Pb::Request: pid:22835 tid:140317244593920 ProcessResponseMetadata(): received 74 bytes
  Count     Int64 Value    Double Value Bool  String Value
180925 11:41:20 22842 ssi_Pb::Request: pid:22835 tid:140317252986624 ProcessResponseData(): received -1 bytes
terminate called after throwing an instance of 'XrdSsiPb::XrdSsiException'
  what():  [ERROR] Operation expired
Aborted

real	0m1.022s

In this case, the server returns the stream handle within the 1s, but it is still the REQUEST timeout which triggers, not the STREAM timeout.

CASE 2. XRD_REQUESTTIMEOUT=3, XRD_STREAMTIMEOUT=2

Same as above, server responds in time, but the REQUEST timeout is triggered, not the STREAM timeout.

CASE 3. XRD_REQUESTTIMEOUT=6, XRD_STREAMTIMEOUT=2

$ time test/src/test-client stream 1000
180925 11:45:03 24983 ssi_Pb::ServiceClientSide: pid:24983 tid:140377820215424 ServiceClientSide constructor: Request timeout = 6s
180925 11:45:03 24983 ssi_Pb::IStreamBuffer: pid:24983 tid:140377820215424 Called IStreamBuffer() constructor
180925 11:45:03 24983 ssi_Pb::Request: pid:24983 tid:140377820215424 Request() constructor: Response buffer size = 1024 bytes, request timeout = 6s
180925 11:45:03 24990 ssi_Pb::Request: pid:24983 tid:140377691342592 ProcessResponse(): response type = isStream
180925 11:45:03 24990 ssi_Pb::Request: pid:24983 tid:140377691342592 ProcessResponseMetadata(): received 74 bytes
  Count     Int64 Value    Double Value Bool  String Value
180925 11:45:05 24991 ssi_Pb::Request: pid:24983 tid:140377682949888 ProcessResponseData(): received -1 bytes
terminate called after throwing an instance of 'XrdSsiPb::XrdSsiException'
  what():  [ERROR] Socket error
Aborted

real	0m2.025s

This time the STREAM timeout is triggered, and the error is Socket error instead of Operation expired.

CASE 4. XRD_REQUESTTIMEOUT=6, XRD_STREAMTIMEOUT=7

$ time test/src/test-client stream 1000
180925 11:27:07 14169 ssi_Pb::ServiceClientSide: pid:14169 tid:139848111949952 ServiceClientSide constructor: Request timeout = 10s
180925 11:27:07 14169 ssi_Pb::IStreamBuffer: pid:14169 tid:139848111949952 Called IStreamBuffer() constructor
180925 11:27:07 14169 ssi_Pb::Request: pid:14169 tid:139848111949952 Request() constructor: Response buffer size = 1024 bytes, request timeout = 10s
180925 11:27:07 14177 ssi_Pb::Request: pid:14169 tid:139847983077120 ProcessResponse(): response type = isStream
180925 11:27:07 14177 ssi_Pb::Request: pid:14169 tid:139847983077120 ProcessResponseMetadata(): received 74 bytes
  Count     Int64 Value    Double Value Bool  String Value
180925 11:27:12 14178 ssi_Pb::Request: pid:14169 tid:139847974684416 ProcessResponseData(): received 1024 bytes
      1               0               0 false 
      2               0               0 false 
      3               0               0 false 
      4               0               0 false 
      5               0               0 false 
      6               0               0 false 
...
    920               0               0 false 
    921               0               0 false 
    922               0               0 false 
    923               0               0 false 
    924               0               0 false 
180925 11:49:45 27486 ssi_Pb::Request: pid:27472 tid:140401844549376 ProcessResponseData(): request more response data
180925 11:49:51 27487 ssi_Pb::Request: pid:27472 tid:140401836156672 ProcessResponseData(): received -1 bytes
terminate called after throwing an instance of 'XrdSsiPb::XrdSsiException'
  what():  [ERROR] Operation expired
Aborted

real	0m46.033s

In this case, the client receives 9 of the 10 chunks, but times out on the last one with Operation expired.

Here are the server-side logs for this case:

180925 11:49:05 1055 XrootdXeq: ctadev.27472:22@localhost pvt IP64 login
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808 Prepare():
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808    Resource name: /test
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808    Resource user: 
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808    Resource info: 
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808    Hosts to avoid: 
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808    Affinity: Default
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808    Resource options: 
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808    Resource Client: 
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808       Protocol:     host
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808       Name:         NULL
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808       Host:         localhost
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808       Vorg:         NULL
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808       Role:         NULL
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808       Grps:         NULL
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808       Endorsements: NULL
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808       Creds:        NULL
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808       Credslen:     0
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808       Rsvd:         0
180925 11:49:05 1055 ssi_Pb::Service: pid:30671 tid:140715573335808       Tident:       
180925 11:49:05 1055 ctadev.27472:22@localhost Ssi_open: /test prepared.
180925 11:49:05 1055 ctadev.27472:22@localhost Ssi_write: 0:/test rsz=5 wsz=5
180925 11:49:05 1055 ctadev.27472:22@localhost Ssi_Activate: 0:/test [new wtReq] oucbuff rqsz=5
180925 11:49:05 30749 ctadev.27472:22@localhost Ssi_DoIt: 0:/test [begun xqReq] Calling service processor
180925 11:49:05 30749 ctadev.27472:22@localhost Ssi_BindDone: 0:/test [begun xqReq] Bind called; for request 0
180925 11:49:05 30749 ctadev.27472:22@localhost Ssi_GetRequest: 0:/test [bound xqReq] sz=5
180925 11:49:05 30749 ssi_Pb::RequestProc: pid:30671 tid:140715575441152 RequestProc(): received 5 bytes
180925 11:49:05 30749 ctadev.27472:22@localhost Ssi_RelReqBuff: 0:/test [bound xqReq] called
180925 11:49:05 30749 ctadev.27472:22@localhost Ssi_ProcessResponse: 0:/test [bound xqReq] Response presented wtr=False
180925 11:49:05 30749 ctadev.27472:22@localhost Ssi_ProcessResponse: 0:/test [bound doRsp] Resp strm
180925 11:49:05 1055 ctadev.27472:22@localhost Ssi_fctl: 0:/test query resp status
180925 11:49:05 1055 ctadev.27472:22@localhost Ssi_AttnInfo: 0:/test 74 byte metadata (0x080112461b5b33...) sent.
180925 11:49:05 1055 ctadev.27472:22@localhost Ssi_fctl: 0:/test resp ready
180925 11:49:10 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1024 bytes)
180925 11:49:10 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 1025 bytes of data.
180925 11:49:15 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1023 bytes)
180925 11:49:15 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 1026 bytes of data.
180925 11:49:20 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1021 bytes)
180925 11:49:20 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 1026 bytes of data.
180925 11:49:25 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1019 bytes)
180925 11:49:25 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 1026 bytes of data.
180925 11:49:30 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1017 bytes)
180925 11:49:30 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 1017 bytes of data.
180925 11:49:35 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1024 bytes)
180925 11:49:35 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 1026 bytes of data.
180925 11:49:40 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1022 bytes)
180925 11:49:40 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 1026 bytes of data.
180925 11:49:45 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1020 bytes)
180925 11:49:45 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 1026 bytes of data.
180925 11:49:50 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (1018 bytes)
180925 11:49:50 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): Returning buffer with 675 bytes of data.
180925 11:49:55 1055 ssi_TestStream: pid:30671 tid:140715573335808 GetBuff(): XrdSsi buffer fill request (343 bytes)
180925 11:49:55 1055 ctadev.27472:22@localhost Ssi_Finalize: 0:/test [bound odRsp] Calling Finished(False)
180925 11:49:55 30749 ctadev.27472:22@localhost Ssi_Dispose: 0:/test [bound odRsp] Recycling request...
180925 11:49:55 30676 XrdPoll: Sever event occured for ctadev.27472:22@localhost
180925 11:49:55 1055 XrdLink: Unable to send to ctadev.27472:22@localhost; broken pipe
180925 11:49:55 1055 XrootdXeq: ctadev.27472:22@localhost disc 0:00:50 (send failure)
180925 11:49:55 1055 ctadev.27472:22@localhost Ssi_close: /test del=True

CASE 5. XRD_REQUESTTIMEOUT=6, XRD_STREAMTIMEOUT=60

This gives exactly the same behaviour as above.

CASE 6. XRD_REQUESTTIMEOUT=15, XRD_STREAMTIMEOUT=6

This is the same as above, except it gives a Socket error instead of Operation expired.

CASE 7. XRD_REQUESTTIMEOUT=11, XRD_STREAMTIMEOUT=60

Finally, success.

SUMMARY

The behaviour of XRD_REQUESTTIMEOUT and XRD_STREAMTIMEOUT and their interaction is not documented, so I am trying to figure out how it should work by trial and error.

What I was expecting is that XRD_REQUESTTIMEOUT would place a limit on the time to receive the stream handle from the server, and XRD_STREAMTIMEOUT would place a limit on the time to receive each chunk of the stream. But it seems this is not the case.

It looks like XRD_REQUESTTIMEOUT is used for two distinct purposes, one for the time to receive the stream handle, and also for the time to receive each chunk of the stream. Also XRD_REQUESTTIMEOUT needs to be set to at least double the time taken to process one chunk, as the final chunk of the stream requires two calls to the stream object (one to partially fill the buffer with the last remaining data, then again to set last = true and return nullptr).

XRD_STREAMTIMEOUT is used to control the total amount of time to receive the entire stream, and there is no way to independently control the timeout on each chunk of the stream.

Please can you confirm that this is the intended behaviour of these timeouts?

XRD_STREAMTIMEOUT is not so useful in its current form. We need a way to limit the time taken to process each chunk of the stream, not the total time.

Also we need to be able to control the time taken to process each chunk of the stream independently from the request timeout.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/xrootd/xrootd","title":"xrootd/xrootd","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/email/message_cards/header.png","avatar_image_url":"https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/xrootd/xrootd"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"[SSI] Problems with timeouts during SSI streaming (#829)"}],"action":{"name":"View Issue","url":"https://github.com/xrootd/xrootd/issues/829"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/829", "url": "https://github.com/xrootd/xrootd/issues/829", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } }, { "@type": "MessageCard", "@context": "http://schema.org/extensions", "hideOriginalBody": "false", "originator": "AF6C5A86-E920-430C-9C59-A73278B5EFEB", "title": "[SSI] Problems with timeouts during SSI streaming (#829)", "sections": [ { "text": "", "activityTitle": "**Michael**", "activityImage": "https://assets-cdn.github.com/images/email/message_cards/avatar.png", "activitySubtitle": "@slithy", "facts": [ { "name": "Repository: ", "value": "xrootd/xrootd" }, { "name": "Issue #: ", "value": 829 } ] } ], "potentialAction": [ { "name": "Add a comment", "@type": "ActionCard", "inputs": [ { "isMultiLine": true, "@type": "TextInput", "id": "IssueComment", "isRequired": false } ], "actions": [ { "name": "Comment", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueComment\",\n\"repositoryFullName\": \"xrootd/xrootd\",\n\"issueId\": 829,\n\"IssueComment\": \"{{IssueComment.value}}\"\n}" } ] }, { "name": "Close issue", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueClose\",\n\"repositoryFullName\": \"xrootd/xrootd\",\n\"issueId\": 829\n}" }, { "targets": [ { "os": "default", "uri": "https://github.com/xrootd/xrootd/issues/829" } ], "@type": "OpenUri", "name": "View on GitHub" }, { "name": "Unsubscribe", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"MuteNotification\",\n\"threadId\": 383855100\n}" } ], "themeColor": "26292E" } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1