Print

Print


Somebody should probably have a look at this. Forwarding for
documentation purposes.

   Lukasz

Begin forwarded message:

Date: Sun, 17 May 2015 23:19:47 +0200
From: Lukasz Janyst <[log in to unmask]>
To: Andreas-Joachim Peters <[log in to unmask]>
Subject: Re: 10G Question


Hi Andreas,

   this was not fixed, but from what I could tell this might actually
have been related to the particular version of kernel/driver.

   The situation is as follows:

1) If you only have one active socket on a 10G interface you get only
around 50% of bandwidth utilisation.
2) If you have two or more active sockets you saturate the link.

   Now, from my observation/research:

1) Socket watermarking (SO_RCVLOWAT sockopt and friends) is not
implemented/broken in Linux.
2) When a tiny bit of data arrives in the NIC buffer, the TCP stack
triggers a poll event for the descriptor. This results in a switch to
user space.
3) This tiny bit of data is then retrieved from the descriptor using a
read call on a non-blocking socket (another user-to-kernel-space switch
and back)
4) Another read request is issued (another trip to kernel space and
back) on the socket witch, this time, resutls with EWOULDBLOC errno
being set, and the client logic deciding to go back to the poll
syscall, which quickly gets triggered again and we're back at 2. The
cycle repeats itself and we're spending all of the time doing useless
context switches witch could be easily prevented if SO_RCVLOWAT
actually worked.

   For two or more sockets, the timing changes and the poll events are
triggered when larger chunks of data have been accumulated. The more
sockets/connections we have the better it gets.

   I vaguely recall that there was an ioctl option controlling this,
but I cannot find it now. Alternatively, you can have a look at the
driver or implement proper socket watermarking in the TCP stack of
Linux :)

   I hope this helps.

Cheers,
   Lukasz

On Wed, 13 May 2015 16:55:48 +0200
Andreas-Joachim Peters <[log in to unmask]> wrote:

> Hi Lukasz,
> 
> in case you remember something quickly.
> 
> We have some problems with the performance of copy commands over 10G.
> 
> If I compare the iperf performance with copy performance of xrootd I
> get only 50% with xrootd.
> 
> I played with the chunksize etc. but with the 3.3.6 versions it seems
> not to change much.
> 
> Do remember anything particular that had been fixed only with the 4.x
> version concerning window scaling etc. ??
> 
> I hope you are all well.
> 
> Cheers Andreas.

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1