Print

Print


Hi Julia,

The way we scaled xrootd monitoring to avoid burdening the server was to 
simply shift the workload to the collector. That meant that state 
information was reported "just in time" at the point it was available. 
Much of that state information is not kept beyond the reporting 
time interval. The information that is kept is not easily accessible, if 
it's accessible at all, by the time monitoring occurs. What we provide 
is a simple reference to previosuly reported data which avoids 
retransmission of that data and keeps the amount of monitoring data sent 
over the wire in balance with the amount of data that we provide to the 
client. Internal monitoring API's and structures are architected to fit 
into that scheme. This was not short-sighted. It allowed us to provide 
far more extensive information at a very low cost accomodating divergent 
and evolving needs without having to revisit the server implementation 
which, frankly, is a very expensive proposition.

Yes, that all works as long as you don't loose too many UDP packets. That 
is why the reliable conversion of UDP to MQ is the long term solution and 
will address the specific problem at hand. You will be able to get exactly 
the information you need once we can stop loosing packets.

I ask for your patience while we implement what will be the satisfactory 
solution for the whole community.

Andy
Andy

On Fri, 25 Jun 2021, Julia Andreeva wrote:

> Hi Andrew,
>
> Thank you for the prompt answer.
> Yes, we know about the development of the local UDP collector converting data into MQ stream.
> This is certainly a good direction.
>
> However, this does not simplify the task of processing a lot of reports related to the same operation further in the chain, which has impact on the collector scalability. And we were wondering if we can substantially decrease amount of data to be processed if  file close report contains just few more attributes.
> If I understand correctly your argument about serious architectural changes, it means that xrootd server does not keep a track of complete info related to a given operation during life time of this operation. Is it correct? For example, the owner of a given operation should be preserved during lifetime of operation, shouldn't it?
>
> Thank you
>
> Cheers
> Julia
> ________________________________________
> From: [log in to unmask] [[log in to unmask]] on behalf of Andrew Hanushevsky [[log in to unmask]]
> Sent: 25 June 2021 09:40
> To: Julia Andreeva
> Cc: [log in to unmask]; Alessandra Forti; Maarten Litmaath
> Subject: Re: xrootd monitoring request
>
> Hi Julia,
>
> The problem is that it would require significant architectutal changes,
> not to mention a whole new set of documentation. We are taking a different
> track to address this problem that we think is a more general solution
> that should address a wider range of needs.
>
> We already have two collectors that know how to piece the information
> together to essentially provide what you are requesting. The problem is
> that if one of the required  UDP packets is lost it makes it impossible
> to provide complete information. The plan is to provide a local UDP
> collector where the likelihood of packet loss is minimal and convert
> the stream to a MQ stream (e.g. rabbitMQ). That would minimize
> information loss. We have one prototype and our collaborators are working
> on some others.
>
> I don't have an exact timeline but given your request I will try to
> establish one that is hopefully timely. This particular effort is
> community supported so we need to take that into account. However, we do
> have OSG and two other collaborative partners participating so it
> shouldn't take too long to provide a solution. I hope that this addresses
> your request.
>
> Andy
>
>
>
>  On Thu, 24 Jun 2021, Julia Andreeva wrote:
>
>> Dear colleagues,
>>
>> We are discussing various options to improve the situation with xrootd monitoring on the WLCG infrastructure.
>> One of the options we consider is to try ALICE approach which consists of processing only file close reports. This looks to be efficient regarding optimization of the amount of data volume to be processed by the collector. However, ALICE uses dedicated xrootd servers, while for monitoring of the xrootd traffic on the WLCG infrastructure, we need to know the owner of the operation in terms of the VO.
>>
>> Two other things which would be useful to have in the file close report is the start time stamp of the operation and application level meta info which is now possible to provide with the monitoring reports, but if I am not wrong , it is send in the beginning.
>>
>> So our question is, would it be possible to extend the file close report with information about: VO, start time stamp of the operation and application level meta info.
>>
>> Please, let us know.
>>
>> Thank you
>>
>> Kind regards
>> Julia
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1