Print

Print


​Hello Maurik,


Have you been able to run the profiler under equivalent conditions with the two jar files yet?

Norman


________________________________
From: [log in to unmask] <[log in to unmask]> on behalf of Maurik Holtrop <[log in to unmask]>
Sent: Sunday, May 21, 2017 7:04 AM
To: Rafayel Paremuzyan
Cc: Alessandra Filippi; hps-software; [log in to unmask]
Subject: Re: Test pass1 Status

Hello Rafo,

One thing that probably is different between the last time we ran with the 3.8 jar and now is a different version of the Java VM. It could well be that the newer version of Java is not faster. Also, it is tricky to compare Endeavour with the Jlab farm computers. They are probably not equivalent in speed. At UNH, Pumpkin has the more modern processors, whereas Endeavour is now ~5 years old.

Best,
Maurik

On May 21, 2017, at 6:54 AM, Rafayel Paremuzyan <[log in to unmask]<mailto:[log in to unmask]>> wrote:

Hi Alessandra, Norman, all

thank you for replay and your tests.

I tested both 2015 and 2016 data using v4-4 detector on UNH computers.
I have use 3.8 JAR (the jar for 2015 pass6). 3.9 JAR (the jar for 2016 pass0 recon), and the NEW jar v051717 (the newest jar tag is v051717)

Ok, I also noticed that recon of 2015 data is faster that 2016 data.
Also seems the new jar is 20% slower than the 3.9 jar for 2016 data, and about 60% slower for 2015 data.
now recon speed is about 2.55 EV/S for 2015 data, This is too slow
it cause more than 40h for a single file.

Ths is summary of code speeds with different jar files
V4-4 Detector, UNH (endeavour), 5K events are reconstructed

        Events per second       Events per second       Events per second

        3.8JAR (2015 recon jar) 3.9JAR, 2016 pass0 recon jar    v051717JAR, jar vor tpass1
2015 Data 5772, file 20 5.07    5.19    3.157
2016 Data file 25
        3.11    2.53


*However* I looked into job Wall times for pass0 recon.
The recon speed is more than 7.4 Events/sec, which is about x3 faster than with the new JAR.

I again checked *same 3.9 jar*, which is slower again.
I don't know why now the code speed is so low!


Norman, I have tried the "-DdisableSvtAlignmentConstants", but it didn't work

=================The command===============
java -XX:+UseSerialGC -cp hps-distribution-3.11-v051717-bin.jar org.hps.evio.EvioToLcio -x /org/hps/steering/recon/PhysicsRun2016FullRecon.lcsim -r -d HPS-PhysicsRun2016-v5-3-fieldmap_globalAlign -R 7796 -DoutputFile=out_7796_0 -DdisableSvtAlignmentConstants hps_007796.evio.25 -n 10000

============The error backtrache============
2017-05-21 00:45:39 [CONFIG] org.hps.evio.EvioToLcio parse :: using steering resource /org/hps/steering/recon/PhysicsRun2016FullRecon.lcsim
2017-05-21 00:45:39 [CONFIG] org.hps.evio.EvioToLcio parse :: set max events to 10000
2017-05-21 00:45:48 [INFO] org.hps.rundb.RunManager <init> :: ConnectionParameters { database: hps_run_db_v2, hostname: hpsdb.jlab.org<http://hpsdb.jlab.org/>, password: darkphoton, port: 3306, user: hpsuser }
2017-05-21 00:45:48 [CONFIG] org.lcsim.job.JobControlManager addVariableDefinition :: outputFile = out_7796_0
2017-05-21 00:45:48 [CONFIG] org.hps.evio.EvioToLcio parse :: set steering variable: outputFile=out_7796_0
2017-05-21 00:45:48 [SEVERE] org.hps.evio.EvioToLcio parse :: bad variable format: disableSvtAlignmentConstants
java.lang.IllegalArgumentException: Bad variable format: disableSvtAlignmentConstants
        at org.hps.evio.EvioToLcio.parse(EvioToLcio.java:393)
        at org.hps.evio.EvioToLcio.main(EvioToLcio.java:97)

Exception in thread "main" java.lang.IllegalArgumentException: Bad variable format: disableSvtAlignmentConstants
        at org.hps.evio.EvioToLcio.parse(EvioToLcio.java:393)
        at org.hps.evio.EvioToLcio.main(EvioToLcio.java:97)

Rafo






On 05/20/2017 06:17 AM, Alessandra Filippi wrote:
Hi Rafo, all,
I also noticed that the reconstruction for 2016 data is about twice as slow as compared to 2015 (whichever geometry and reconstruction version).
This happens when I run the aligned geometry as well as the "current" one (v5.0), and the geometry taken from the db as well (the result is the same as v5.0). I did not make any test with v4.4, though - actually as far as svt alignment is concerned it should be the same as v5.0. Can you please try and make the same short test with the newest jar with v4.4?
This happens to me both with hps-java 5.10 and 5.11 (not the most updated one).

I would be surprised if it could be something connected to the alignment, unless for some reason new positions and harder tracks trigger some long loops in the reconstruction. But this happens (to me) also with the
standard geometry, so a check with the one used with pass0 (that should however be equivalent to v5.0) could at least help to rule out, or blame on, the alignment step.
Thanks, cheers
    Alessandra


ps. make also sure that the correct fieldmap is called in all the compact files - you never know!



On Fri, 19 May 2017, Rafayel Paremuzyan wrote:

Hi All,

During the testing the recon for test pass1,
I noticed the recon time is more than x2 longer wrt pass0 recon time.

To demonstrate it
I submit 3 simple jobs with 10K events to reconstruct, with new pass1 xml
file (this has the new jar v051717, and the new detector
HPS-PhysicsRun2016-v5-3-fieldmap_globalAlign),
and the old pass0 xml file (pass0 jar release 3.9, and the detector
HPS-PhysicsRun2016-Nominal-v4-4-fieldmap)

Below is a printout from the jobs with a new JAR, v051717, the average time
for 1000 events is more than 7 minutes
===================== LOG from the v051717 JAR
==============================
2017-05-19 09:36:51 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10614074 with sequence 0
2017-05-19 09:43:13 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10615074 with sequence 1000
2017-05-19 09:49:18 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10616074 with sequence 2000
2017-05-19 09:55:54 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10617074 with sequence 3000
2017-05-19 10:02:55 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10618074 with sequence 4000
2017-05-19 10:09:57 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10619074 with sequence 5000
2017-05-19 10:16:13 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10620074 with sequence 6000
2017-05-19 10:25:20 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10621074 with sequence 7000
2017-05-19 10:32:56 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10622074 with sequence 8000
2017-05-19 10:36:19 [WARNING] org.hps.recon.tracking.TrackerReconDriver
process :: Discarding track with bad HelicalTrackHit (correction distance
0.000000, chisq penalty 0.000000)
2017-05-19 10:42:03 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10623074 with sequence 9000
2017-05-19 10:47:44 [INFO] org.hps.evio.EvioToLcio run :: maxEvents 10000
was reached
2017-05-19 10:47:44 [INFO] org.lcsim.job.EventMarkerDriver endOfData ::
10000 events processed in job.
2017-05-19 10:47:44 [INFO] org.hps.evio.EvioToLcio run :: Job finished
successfully!


And below is the Job log info from the pass0 jar. The average time for 1000
events is less than 3 minutes
===================== LOG from the 3.9 release JAR
==============================
2017-05-19 13:19:46 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10614074 with sequence 0
2017-05-19 13:23:36 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10615074 with sequence 1000
2017-05-19 13:27:03 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10616074 with sequence 2000
2017-05-19 13:30:40 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10617074 with sequence 3000
2017-05-19 13:34:20 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10618074 with sequence 4000
2017-05-19 13:38:11 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10619074 with sequence 5000
2017-05-19 13:41:43 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10620074 with sequence 6000
2017-05-19 13:45:54 [WARNING] org.hps.recon.tracking.TrackerReconDriver
process :: Discarding track with bad HelicalTrackHit (correction distance
0.000000, chisq penalty 0.000000)
2017-05-19 13:46:05 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10621074 with sequence 7000
2017-05-19 13:50:08 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10622074 with sequence 8000
2017-05-19 13:55:03 [INFO] org.lcsim.job.EventMarkerDriver process :: Event
10623074 with sequence 9000
2017-05-19 13:58:27 [INFO] org.hps.evio.EvioToLcio run :: maxEvents 10000
was reached
2017-05-19 13:58:27 [INFO] org.lcsim.job.EventMarkerDriver endOfData ::
10000 events processed in job.
2017-05-19 13:58:27 [INFO] org.hps.evio.EvioToLcio run :: Job finished
successfully!

I also tried to do reconstruction by myself interactively, but I am getting
error below,

The command
/apps/scicomp/java/jdk1.7/bin/java -XX:+UseSerialGC -cp
hps-distribution-3.9-bin.jar org.hps.evio.EvioToLcio -x
/org/hps/steering/recon/PhysicsRun2016FullRecon.lcsim -r -d
HPS-PhysicsRun2016-v5-3-fieldmap_globalAlign -R 7796 -DoutputFile=out_7796_0
hps_007796.evio.0 -n 10000

The Error traceback
017-05-19 14:58:44 [CONFIG] org.hps.evio.EvioToLcio parse :: using steering
resource /org/hps/steering/recon/PhysicsRun2016FullRecon.lcsim
2017-05-19 14:58:44 [CONFIG] org.hps.evio.EvioToLcio parse :: set max events
to 10000
2017-05-19 14:58:45 [CONFIG] org.lcsim.job.JobControlManager
addVariableDefinition :: outputFile = out_7796_0
2017-05-19 14:58:45 [CONFIG] org.hps.evio.EvioToLcio parse :: set steering
variable: outputFile=out_7796_0
2017-05-19 14:58:45 [CONFIG] org.lcsim.job.JobControlManager initializeLoop
:: initializing LCSim loop
2017-05-19 14:58:45 [CONFIG] org.lcsim.job.JobControlManager initializeLoop
:: Event marker printing disabled.
2017-05-19 14:58:45 [INFO]
org.hps.conditions.database.DatabaseConditionsManager resetInstance ::
DatabaseConditionsManager instance is reset
Exception in thread "main" java.lang.UnsatisfiedLinkError:
/u/apps/scicomp/java/jdk1.7.0_75/jre/lib/i386/xawt/libmawt.so: libXext.so.6:
cannot open shared object file: No such file or directory
        at java.lang.ClassLoader$NativeLibrary.load(Native Method)
        at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1965)
        at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1890)
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1851)
        at java.lang.Runtime.load0(Runtime.java:795)
        at java.lang.System.load(System.java:1062)
        at java.lang.ClassLoader$NativeLibrary.load(Native Method)
        at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1965)
        at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1890)
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1872)
        at java.lang.Runtime.loadLibrary0(Runtime.java:849)
        at java.lang.System.loadLibrary(System.java:1088)
        at
sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:67)
        at
sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:47)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.awt.Toolkit.loadLibraries(Toolkit.java:1653)
        at java.awt.Toolkit.<clinit>(Toolkit.java:1682)
        at java.awt.Component.<clinit>(Component.java:595)
        at org.lcsim.util.aida.AIDA.<init>(AIDA.java:68)
        at org.lcsim.util.aida.AIDA.defaultInstance(AIDA.java:53)
        at org.hps.evio.RfFitterDriver.<init>(RfFitterDriver.java:31)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        atsun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcce
ssorImpl.java:57)
        atsun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstru
ctorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.lang.Class.newInstance(Class.java:379)
        at
org.lcsim.job.JobControlManager.setupDrivers(JobControlManager.java:1199)
        at org.hps.job.JobManager.setupDrivers(JobManager.java:82)
        at
org.lcsim.job.JobControlManager.setup(JobControlManager.java:1052)
        at
org.lcsim.job.JobControlManager.setup(JobControlManager.java:1110)
        at org.hps.evio.EvioToLcio.parse(EvioToLcio.java:407)
        at org.hps.evio.EvioToLcio.main(EvioToLcio.java:97)



I see this library libXext.so.6: in /usr/lib64, but not in /usr/lib,
when I put /usr/lib64 in my LD_LIBRARY_PATH, then it complaines again (see
below)

Exception in thread "main" java.lang.UnsatisfiedLinkError:
/u/apps/scicomp/java/jdk1.7.0_75/jre/lib/i386/xawt/libmawt.so: libXext.so.6:
wrong ELF class: ELFCLASS64

I would appreciate, if I get some help on running the reconstruction
interactively, then I could look more closely into logs
of the old, and new JAR files.

Rafo


____________________________________________________________________________

Use REPLY-ALL to reply to list

To unsubscribe from the HPS-SOFTWARE list, click the following link:
https://urldefense.proofpoint.com/v2/url?u=https-3A__listserv.slac.stanford.edu_cgi-2Dbin_wa-3FSUBED1-3DHPS-2DSOFTWARE-26A-3D1&d=DwIDaQ&c=lz9TcOasaINaaC3U7FbMev2lsutwpI4--09aP8Lu18s&r=0HDJrGO9TZQTE97J9Abt2A&m=xnbGP5VHYWRAQRWRksVgMnYvBkXWI4roLxztdJ0Tp9I&s=ppNYedSrn5DPaIZZJgRZu8tBDeSjroqbj_PoevFoFpI&e=



########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the HPS-SOFTWARE list, click the following link:
https://urldefense.proofpoint.com/v2/url?u=https-3A__listserv.slac.stanford.edu_cgi-2Dbin_wa-3FSUBED1-3DHPS-2DSOFTWARE-26A-3D1&d=DwIDaQ&c=lz9TcOasaINaaC3U7FbMev2lsutwpI4--09aP8Lu18s&r=0HDJrGO9TZQTE97J9Abt2A&m=xnbGP5VHYWRAQRWRksVgMnYvBkXWI4roLxztdJ0Tp9I&s=ppNYedSrn5DPaIZZJgRZu8tBDeSjroqbj_PoevFoFpI&e=


________________________________

Use REPLY-ALL to reply to list

To unsubscribe from the HPS-SOFTWARE list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1


________________________________

Use REPLY-ALL to reply to list

To unsubscribe from the HPS-SOFTWARE list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the HPS-SOFTWARE list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1