Print

Print


I will try to answer point by point

The issue here is that the xrootd@storage06 is not rendezvousing wit the cmsd@storage02 and hence the cmsd will not start talking to the redirector until that happens. The line here in he cmsd.log:

erm, is that storage02 above is a typo, or is something wrong in settings? the configuration specify the rd.spacescience.ro as redirector and i see not storage02 mentions

201209 12:26:29 17136 Start: Waiting for primary server to login.

On the other hand, the xrootd thinks it's connected to the cmsd as shown in the xrd.log:
201209 12:26:30 17160 cms_Finder: Connected to cmsd via /tmp/xrootd_1000/storage01/storage01/.olb/olbd.admin

When this happens it means that the path to the named pipe for the xrootd is not the same as the one the cmsd is using. The xrootd wants to use "/tmp/xrootd_1000/storage01/storage01/.olb/olbd.admin". Unfortunately, we don't display that in the cmsd log and we will correct that in a future release.

ooh.. that fencing !! so the problem is that pidfile is not fenced and admin path is .. and i did:
https://github.com/adriansev/alicexrd/blob/master/xrdctl#L218
https://github.com/adriansev/alicexrd/blob/master/xrdctl#L242
i fixed this, but it still does not work, i updated the logs in the cernbox share.

How does this happen? Usually because a) the instance names differ (i.e. the -n command line option), or b) the adminpaths differ. Since there was no all.adminpath directive in the config file, it must have been supplied by on the command line using the -a or the -A command line option. Unfortunately, we do not display the command line in the log file so I can't tell (we will coret that in a future release).

the starting commands are:
https://github.com/adriansev/alicexrd/blob/master/xrdctl#L234
https://github.com/adriansev/alicexrd/blob/master/xrdctl#L257

Some unrelated observations:

  1. Placing he adminpath in "/tmp" is highly discouraged. Various control files are placed in this path and many sites (perhaps not yours) run a /tmp cleanup daemon that happily deletes old files in /tmp. If that happens to files in the adminpath the system will likely stop working. The message in the log draws attention to this:
    Config warning: adminpath resides in /tmp and may be unstable!

yes, i know, i wanted a general available directory so i used /tmp .. that that you mention i could just use /var/tmp or just use the user home and be done with it ... but so far was safe... on centos /tmp is not tmpfs (like in fedora) and for certain there is NO periodic cleaning of /tmp in any systems that use systemd

  1. There is an error in the config file as noted by the message:
    Config warning: ignoring unknown directive 'e/aliprod/data'.
    =====> oss.namelib /usr/lib64/libXrdAggregatingName2Name.so /home/aliprod/data
    This is a bug (thank you for tripping over it) and will be corrected in 5.1.0.

oohh .. ok... will this require another recompilation/rebuild of XrdAggregatingName2Name plugin? (this is an ALICE specific plugin, packaged by me and in use by the ALICE vanilla xrootd installations)

  1. There is no reason to scan idle links for the cmsd and, in fact, is discouraged as it causes unnecessary disconnections. The xrd.timeout directive should only apply to the xrootd. This is especially true because of the remarkably short idle timeout of 10 seconds that causes the system to scan all the links every 6 seconds. Is it necessary to have such a short timeout?

well, it's always better to free unused resources as not to have contention of any kind .. so, if idle then the job is done, let's close it .. but i do not get 2 things:

  1. given that xrd.timeout is prefixed with xrd i imagine that automatically works only for xrootd .. i imagine that it would have been a general instruction, there could be a all.timeout and specific for cmsd a cms.timeout ... isn't it like this?
  2. i do not get the scan thing... is this a hidden implication? like to kill after N seconds then the connection list must be checked every N - safe_value ?
  1. I hope starting the servers in debugging mode was for my benefit. Running servers at this debugging mode easily reduces performance by as much as 50% along with creating huge log files.

obvious, given that things does not work of course i enabled the debug :)

So, all I need to see is what the startup command line was for each server (xrootd and cmsd) and that will tell the story.

https://github.com/adriansev/alicexrd/blob/master/xrdctl#L234
https://github.com/adriansev/alicexrd/blob/master/xrdctl#L257

We can discuss over email to give you a reminder about your access to the machine :)


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/1356#issuecomment-743033045", "url": "https://github.com/xrootd/xrootd/issues/1356#issuecomment-743033045", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1