From: Hps-analysis <[log in to unmask]> on behalf of Nathan Baltzell <[log in to unmask]>
Sent: Friday, January 14, 2022 10:18 AM
To: [log in to unmask] <[log in to unmask]>; hps-software <[log in to unmask]>
Subject: Re: [Hps-analysis] 2021 trigger skims

Hello Everyone,

This is now ~99.5% complete, all jobs finished. The data should be on tape within a couple weeks, or I can trigger it to happen sooner, meanwhile it's a mix of /cache and /mss:

/cache/hallb/hps/physrun2021/production/evio-skims -> /mss/hallb/hps/physrun2021/production/evio-skims

I have not pinned any of it, which means it will get automatically deleted from /cache disk eventually (but not before it's on tape, of course). HPS currently has a pin quota of 60 TB, and these skims total to 85 TB (as expected, about 10% of the full EVIO). Getting a few TB off tape here and there isn't usually a significant bottleneck, but If people want to pin some of this stuff, let's discuss a plan.

FYI, this workflow got 4-5 times faster last Friday, because it led to Scicomp finding an issue with tape job ordering causing files not to be read contiguously (something to do with mysql/mariadb sorting changes over the years, and seek in this case went from ~80% to basically 0% after the fix), so that was cool and good for everyone for the future. It then greatly slowed down due to temporary Lustre and tape issues starting on Saturday and lingering for a few days. So, the 20 day estimate in the previous email turned out about right even though the system changed greatly.

The remaining 0.5% lives unmerged at /work/hallb/hps/baltzell/physrun2021/evio-skims for now. It's a mix of system errors like Lustre, and some unskimmed files really containing no such trigger bits. I'll probably just blindly resubmit them and assume what comes out is right, if there's no more system errors during those jobs.

-Nathan

> On Dec 26, 2021, at 3:26 PM, Nathan Baltzell <[log in to unmask]> wrote:
>
> Hello Everyone,
>
> After assessing tape performance for these jobs at larger scale (maybe related to larger-than-expected incoming data from the other halls in December), I killed the current workflow a couple weeks ago and refactored it to be 1:1 on jobs to EVIO files for tape performance (which required accommodating for void outputs on some files), added independent 100:1 merger jobs to write final files to /cache and cleanup temporaries on /volatile, and resumed a few days ago.
>
> Based on recent performance, should be about 20 days remaining on these skims, dictated by tape access. I suspect that can be reduced a good bit, maybe 2x, by ordering future jobs by position on tape. In this particular case of trigger-bit skimming, there's a competing issue of larger temporary disk footprint before merging.
>
> Regarding the 5 batches: the first one I truncated at 50% for refactoring a couple weeks ago and will be cleaned up later, the second one started 2 days ago and is about 70% complete.
>
> Final outputs will always be here:
>
> /cache/hallb/hps/physrun2021/production/evio-skims
>
> -Nathan
>
>
>> On Dec 7, 2021, at 10:03 PM, Nathan Baltzell <[log in to unmask]> wrote:
>>
>> Hello All,
>>
>> After some further preparations, the 2021 trigger skims are launched.
>>
>> Outputs will be going to /cache/hallb/hps/physrun2021/production/evio-skims.
>>
>> I broke the run list from Norman into 5 lists, and started with the first 20% in one batch, all submitted. I'll proceed to the other 4 batches over the holidays, assessing tape usage as we go.
>>
>> -Nathan
>>
>>> On Nov 29, 2021, at 3:39 PM, Nathan Baltzell <[log in to unmask]> wrote:
>>>
>>> The 10x larger test is done at /volatile/hallb/hps/baltzell/trigtest3
>>>
>>> -Nathan
>>>
>>>
>>>> On Nov 29, 2021, at 2:52 PM, Nathan Baltzell <[log in to unmask]> wrote:
>>>>
>>>> Hello All,
>>>>
>>>> Before running over the entire 2021 data set, I ran some test jobs using Maurik’s EVIO trigger bit skimmer. Here’s the fraction of events kept in 14750 for each skim:
>>>>
>>>> fee 2.0%
>>>> moll 3.3%
>>>> muon 1.9%
>>>> rndm 2.9%
>>>>
>>>> In each case, it’s inclusive of all such types, e.g., moll=moll+moll_pde+moll_pair, rndm=fcup+pulser.
>>>>
>>>> Are those numbers in line with expectations? The total is 10% and not a problem if these skims are expected to be useful. The outputs are at /volatile/hallb/hps/baltzell/trigtest2 if people are interested to check things.
>>>>
>>>> A 10x larger test is running now and going to /volatile/hallb/hps/baltzell/trigtest3 and should be done in the next couple hours.
>>>>
>>>> ************
>>>>
>>>> Note, it would be prudent to do this *only* for production runs, those that would be used in physics analysis, to avoid unnecessary tape access. By that I mean removing junk runs, keeping only those with some significant number of events, and only keeping those with physics trigger settings (not special runs). For that we need a run list. I think we have close to a PB, but I remember hearing at the collaboration meeting that at least 20% is not useful for the porpoises of trigger bit skimming.
>>>>
>>>> -Nathan_______________________________________________
>>>> Hps-analysis mailing list
>>>> [log in to unmask]
>>>> https://mailman.jlab.org/mailman/listinfo/hps-analysis
>>>
>>
>