Print

Print


On 9/11/19 12:15 PM, Michal Kamil Simon wrote:
> Hi Adrian,
Hi!

> Your ZIP file you use for testing is compressed, for now we only support 
> extraction from ZIP files
> that are not compressed (our use case are ROOT files that use ZIP format 
> only for bundling).
oh...

> You can create an uncompressed ZIP file for testing using following command:
> *zip -0 foo.zip /path/to/files/foo*
all ALICE data and MC central productions are stored in compressed zip 
files.. so it does not matter the use case of uncompressed zips

Is there any chance to have also the extraction from compressed zips?

In my naive thinking is that if you use anyway vector reads to read the
central directory record and then the file entries (local file header, 
encryption file header, file data and data descriptor) you can use the 
info from local file header to decompress the data
(from what i see at first glance in 
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT)

Or should i go towards downloading the zip file to temporary and extract 
the needed files on the client?

Thanks a lot!!
Adrian



> 
> Cheers,
> Michal
> ________________________________________
> From: Adrian Sevcenco
> Sent: 11 September 2019 09:59
> To: Michal Kamil Simon; [log in to unmask]
> Subject: Re: python :: cp process fails when it shouldn't (another 
> utility can download file)
> 
> On 9/10/19 2:27 PM, Michal Kamil Simon wrote:
>  > Hi Adrian,
> Hi Michal!
> 
>  > Yes, this should also work from python.
>  >
>  > You just need to add the ?xrdcl.unzip=fn to the metalink url (nothing 
> else), also remove the #fn from the files
>  > inside the metalink.
> well, it is not working and i get this kind of messages:
> jobID: 1/1 >>> STATUS: 1 ; ERRNO: 0 ; CODE: 13 ; MESSAGE: [ERROR]
> Operation not supported: Decompression is not supported!
> 
> full xrootd log, and my cli debug output (and metalink) are here:
> https://cernbox.cern.ch/index.php/s/JNaLKsaC5pyrhMP
> 
> the actual metalink was named with % but it seems that EOS have a bug
> and deletes the files with % in names (i cannot copy through webdav the
> original meta file)
> 
> Thanks a lot!!
> Adrian
> 
> 
>  >
>  > To make the xrdcl.unzip work for the links/files inside metalink is 
> more complicated, if this is something
>  > you would be interested in, please create an issue in github, but I 
> don't think we will be able to accommodate
>  > this feature request for 4.10.1.
>  >
>  > It is the client that is extracting the file from the zip archive - 
> basically the client reads the central directory record
>  > of the zip archives in order to know the offset of the respective 
> file within the archive.
>  >
>  > Cheers,
>  > Michal
>  > ________________________________________
>  > From: Adrian Sevcenco
>  > Sent: 10 September 2019 13:11
>  > To: Michal Kamil Simon; [log in to unmask]
>  > Subject: Re: python :: cp process fails when it shouldn't (another 
> utility can download file)
>  >
>  > On 9/10/19 1:22 PM, Michal Kamil Simon wrote:
>  >> Hi Adrian,
>  > Hi!
>  >
>  >> Thanks for reporting this problem!
>  >>
>  >> I just pushed a patch that will allow for adding opaque info to 
> local files:
>  >> 
> https://github.com/xrootd/xrootd/commit/45f4d8cf1cf2e7bdfc69c461dd47d45ae838748c
>  > great! thanks a lot!!!
>  >
>  >> with this fix you will be able to add the '?xrdcl.unzip=fn' cgi to 
> the local metalink (I run some tests
>  >> and it extracts the right file correctly).
>  > So, would this metfile.meta4?xrdcl.unzip=zipped_file work for the python
>  > copyProcess?
>  >
>  > Also, would these parameters be added? (the url from metalink have the
>  > ALICE authorization envelope)
>  >
>  >> The patch will be included in 4.10.1 which we are now preparing.
>  > any chance of adding this feature also to the links from within the
>  > metalink? i do not know what is happening within, so i just ask :)
>  >
>  > If the unziping is done on the server side, maybe it would make more
>  > sense to work with actual url (from metalink), the way that you
>  > initially told me to do?
>  >
>  > Thanks a lot!
>  > Adrian
>  >
>  >
>  >
>  >>
>  >> Cheers,
>  >> Michal
>  >> ________________________________________
>  >> From: Adrian Sevcenco
>  >> Sent: 09 September 2019 12:18
>  >> To: Michal Kamil Simon; [log in to unmask]
>  >> Subject: Re: python :: cp process fails when it shouldn't (another 
> utility can download file)
>  >>
>  >> On 9/9/19 12:56 PM, Michal Kamil Simon wrote:
>  >>> Hi Adrian,
>  >>>
>  >>> Could you try adding the /xrdcl.unzip=AliAOD.root/ cgi to the original
>  >>> metalink URL, like:
>  >>>
>  >>> 
> file://localhost/home/adrian/tmp/_alice_data_2018_LHC18m_000291397_pass1_withTRDtracking_AOD208_0884_AliAOD.root.meta4?xrdcl.unzip=AliAOD.root
>  >> well, i do not think that the argument understand something else than a
>  >> file :
>  >>
>  >> ll
>  >> total 3266976
>  >> -rw-r--r-- 1 adrian adrian 3345375231 Sep 9 13:06 AliAOD.root.zip
>  >> -rw-rw-r-- 1 adrian adrian 3289 Sep 9 13:02
>  >> 
> _alice_data_2018_LHC18m_000291397_pass1_withTRDtracking_AOD208_0884_AliAOD.root.meta4
>  >>
>  >> xrdcp -p -P -f
>  >> 
> _alice_data_2018_LHC18m_000291397_pass1_withTRDtracking_AOD208_0884_AliAOD.root.meta4?xrdcl.unzip=AliAOD.root
>  >> AliAOD.root
>  >> xrdcp: No such file or directory processing
>  >> 
> _alice_data_2018_LHC18m_000291397_pass1_withTRDtracking_AOD208_0884_AliAOD.root.meta4?xrdcl.unzip=AliAOD.root
>  >>
>  >> xrdcp -p -P -f
>  >> 
> file://localhost/_alice_data_2018_LHC18m_000291397_pass1_withTRDtracking_AOD208_0884_AliAOD.root.meta4?xrdcl.unzip=AliAOD.root
>  >> AliAOD.root
>  >> xrdcp: No such file or directory processing
>  >> 
> /_alice_data_2018_LHC18m_000291397_pass1_withTRDtracking_AOD208_0884_AliAOD.root.meta4?xrdcl.unzip=AliAOD.root
>  >>
>  >> xrdcp -p -P -f
>  >> 
> file://localhost/${PWD}/_alice_data_2018_LHC18m_000291397_pass1_withTRDtracking_AOD208_0884_AliAOD.root.meta4?xrdcl.unzip=AliAOD.root
>  >> AliAOD.root
>  >> xrdcp: No such file or directory processing
>  >> 
> //home/adrian/work-GRID/jalien_py/t/_alice_data_2018_LHC18m_000291397_pass1_withTRDtracking_AOD208_0884_AliAOD.root.meta4?xrdcl.unzip=AliAOD.root
>  >>
>  >> xrdcp -p -P -f
>  >> 
> file:///localhost/${PWD}/_alice_data_2018_LHC18m_000291397_pass1_withTRDtracking_AOD208_0884_AliAOD.root.meta4?xrdcl.unzip=AliAOD.root
>  >> AliAOD.root
>  >> xrdcp: No such file or directory processing
>  >> 
> /localhost//home/adrian/work-GRID/jalien_py/t/_alice_data_2018_LHC18m_000291397_pass1_withTRDtracking_AOD208_0884_AliAOD.root.meta4?xrdcl.unzip=AliAOD.root
>  >>
>  >> xrdcp -p -P -f
>  >> 
> file://${PWD}/_alice_data_2018_LHC18m_000291397_pass1_withTRDtracking_AOD208_0884_AliAOD.root.meta4?xrdcl.unzip=AliAOD.root
>  >> AliAOD.root
>  >> xrdcp: No such file or directory processing
>  >> 
> /home/adrian/work-GRID/jalien_py/t/_alice_data_2018_LHC18m_000291397_pass1_withTRDtracking_AOD208_0884_AliAOD.root.meta4?xrdcl.unzip=AliAOD.root
>  >>
>  >> Thank you!
>  >> Adrian
>  >>
>  >>>
>  >>> (it seems that there's a friction between the metalink handling and zip
>  >>> archive handling)
>  >>>
>  >>> Cheers,
>  >>> Michal
>  >>> ________________________________________
>  >>> From: Adrian Sevcenco
>  >>> Sent: 06 September 2019 16:30
>  >>> To: Michal Kamil Simon; [log in to unmask]
>  >>> Subject: Re: python :: cp process fails when it shouldn't (another
>  >>> utility can download file)
>  >>>
>  >>> Hi Michal!
>  >>>
>  >>> On 9/2/19 5:52 PM, Michal Kamil Simon wrote:
>  >>> > Yep, it looks about right :-)
>  >>> so, i use ?xrdcl.unzip=AliAOD.root&authz=TOKEN
>  >>> format but it seems that the zipfile is downloaded instead of file
>  >>> extracted from archive...
>  >>>
>  >>> the detailed log is here:
>  >>> https://cernbox.cern.ch/index.php/s/JNaLKsaC5pyrhMP
>  >>>
>  >>> do you have any idea/hint why the file is not extracted?
>  >>>
>  >>> Thanks a lot!!
>  >>> Adrian
>  >>>
>  >>>
>  >>>
>  >>> >
>  >>> > Michal
>  >>> > ________________________________________
>  >>> > From: Adrian Sevcenco
>  >>> > Sent: 02 September 2019 16:46
>  >>> > To: Michal Kamil Simon; [log in to unmask]
>  >>> > Subject: Re: python :: cp process fails when it shouldn't (another
>  >>> utility can download file)
>  >>> >
>  >>> > On 9/2/19 5:29 PM, Michal Kamil Simon wrote:
>  >>> >> Hi Adrian,
>  >>> > Hi!
>  >>> >
>  >>> >> You can replace the '#' with '?xrdcl.unzip=' however you have to 
> make
>  >>> >> sure that if the URL
>  >>> >> already contains a CGI you have to replace the following '?' 
> with '&',
>  >>> >> e.g. :
>  >>> >>
>  >>> >>
>  >>> 
> root://eosalice.cern.ch:1094//15/62933/56da6906-9149-11e7-ba1b-579516ed5c66*#*AliAOD.root*?*authz=LONG_ALICE_TOKEN
>  >>> >>
>  >>> >> gets transformed into:
>  >>> >>
>  >>> >>
>  >>> 
> root://eosalice.cern.ch:1094//15/62933/56da6906-9149-11e7-ba1b-579516ed5c66*?xrdcl.unzip=*AliAOD.root*&*authz=LONG_ALICE_TOKEN
>  >>> >
>  >>> > yeap, is doable as the full url (physical url + token) is 
> constructed by
>  >>> > me... so i can do something like :
>  >>> >
>  >>> > pfn_components = pfn.split('#') # i have the guarantee that the files
>  >>> > that ALICE uploads have no # in name
>  >>> >
>  >>> > if len(pfn_components) > 1:
>  >>> > full_url = pfn + '?xrdcl.unzip=' + pfn_components[1] +
>  >>> > '&authz=LONG_ALICE_TOKEN'
>  >>> > else:
>  >>> > full_url = pfn + '?authz=LONG_ALICE_TOKEN'
>  >>> >
>  >>> > Does it sound right?
>  >>> > Thanks a lot for help!!
>  >>> > Adrian
>  >>> >
>  >>> >
>  >>> >
>  >>> >>
>  >>> >>
>  >>> >>
>  >>> >> Regarding the /CopyProcess.add_job(...)/ method I could add 
> parameters
>  >>> >> that will allow
>  >>> >> to specify the file name for extraction from zip archive.
>  >>> >>
>  >>> >> Regarding supporting the '#' root native format we will have to 
> see with
>  >>> >> Andy whether this
>  >>> >> wont harm any existing use cases (as # is a legal character that 
> could
>  >>> >> be used a file name).
>  >>> >>
>  >>> >> Cheers,
>  >>> >> Michal
>  >>> >> ________________________________________
>  >>> >> From: Adrian Sevcenco
>  >>> >> Sent: 02 September 2019 15:37
>  >>> >> To: Michal Kamil Simon; [log in to unmask]
>  >>> >> Subject: Re: python :: cp process fails when it shouldn't (another
>  >>> >> utility can download file)
>  >>> >>
>  >>> >> On 9/2/19 2:09 PM, Michal Kamil Simon wrote:
>  >>> >> > Hi Adrian,
>  >>> >> Hi!
>  >>> >>
>  >>> >> > >From what I see in the logs you use the following file name:
>  >>> >> >
>  >>> >> >
>  >>> >>
>  >>> 
> root://eosalice.cern.ch:1094//15/62933/56da6906-9149-11e7-ba1b-579516ed5c66#AliAOD.root
>  >>> >> >
>  >>> >> > The '#' is root syntax for unpacking root files, this is not 
> supported
>  >>> >> > in the
>  >>> >> > xrootd client, instead you have to use the /xrdcl.unzip/ cgi 
> tag, e.g.
>  >>> >> >
>  >>> >> >
>  >>> >>
>  >>> 
> root://eosalice.cern.ch:1094//15/62933/56da6906-9149-11e7-ba1b-579516ed5c66?xrdcl.unzip=AliAOD.root
>  >>> >>
>  >>> >> oh!!! so, could i use a simplistic logic like :
>  >>> >> replace latest '#' from string with '?xrdcl.unzip='
>  >>> >>
>  >>> >> ALICE stores files in the form of GUID (that last uid)
>  >>> >> and when i request access to a lfn i get the guid and the authz 
> envelope
>  >>> >> for accessing the file ... so, it is guaranteed that i will 
> always get a
>  >>> >> url with a GUID ...
>  >>> >>
>  >>> >> Given this, do you thing that i could use the logic from above?
>  >>> >>
>  >>> >> > alternatively I can expose extracting of zip files (root files 
> use zip
>  >>> >> > format for bundling)
>  >>> >> > in the /CopyProcess.add_job(...)/ method.
>  >>> >> that would be great! if it is possible it would be best if
>  >>> >> the same format of '#file' is recognized (as this is the url 
> that i get
>  >>> >> when requesting lfn access)
>  >>> >>
>  >>> >> Thanks a lot!!
>  >>> >> Adrian
>  >>> >>
>  >>> >> >
>  >>> >> > Hope this helps!
>  >>> >> >
>  >>> >> > Cheers,
>  >>> >> > Michal
>  >>> >> >
>  >>> >> > ________________________________________
>  >>> >> > From: Adrian Sevcenco
>  >>> >> > Sent: 01 September 2019 22:13
>  >>> >> > To: [log in to unmask]
>  >>> >> > Cc: Michal Kamil Simon
>  >>> >> > Subject: python :: cp process fails when it shouldn't (another 
> utility
>  >>> >> > can download file)
>  >>> >> >
>  >>> >> > Hi! I have a really baffling situation where my python tool cannot
>  >>> >> > download a file and another tool (java based, use xrdcp) can 
> download
>  >>> >> > the same file ...
>  >>> >> >
>  >>> >> > the detailed logs for my cp are here :
>  >>> >> > https://cernbox.cern.ch/index.php/s/JNaLKsaC5pyrhMP
>  >>> >> >
>  >>> >> > the java based tool it seems that somehow ignores the external 
> XRD_
>  >>> >> > variables so i cannot get a log of cp process
>  >>> >> >
>  >>> >> > Could some expert take a look please and point me to a hint 
> why my cp
>  >>> >> > fails and the other tool can download just fine?
>  >>> >> >
>  >>> >> > Thank you!!
>  >>> >> > Adrian
>  >>> >> >
> 


-- 
----------------------------------------------
Adrian Sevcenco, Ph.D.                       |
Institute of Space Science - ISS, Romania    |
adrian.sevcenco at {cern.ch,spacescience.ro} |
----------------------------------------------


########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1