Critical path : the output format
Now, at the end of this first week, I was facing an issue that is not exactly trivial. I was about to write the PisaBBC::stepManager (the piece of code which is executed for each Detector at each tracking step = former gustep in Geant3 land), when I realized that I need a hit (and hit container) class for that. Before embarking on that one, I gave a thought on the output structure (ideally a phool node tree, I guess we'll all agree on this). So I tried to understand how it's done now in Pisa. If I read correctly the code, each event is split (at gukine level) into subevents (how many particles per subevent ?), then tracked, and then the resulting hit zebra banks (!) appended to the PisaEVENT master object (which handles subsystem hit classes). Repeat for each subevent, and in the end, write this to a ROOT file.
Here are a couple of questions I've asked to be sure I understand all the implications, together with my interpretation of Charlie's answers :
- When the zebra banks are copied to PisaEVENT, the memory usage jumps (because hits are "duplicated" at that moment). Is that true ?
- True, but that's only a consequence of the moving framework syndrom... This step should for sure disappear.
- This model assumes that all hits for a given event (whetever the number of primaries) can fit in memory. Is that true ?
- No. Only hits for the primaries and secondaries of the current subevent are kept.
- If 2 is yes, will that assumption change with the upgrade detectors (I think here of both the TPC and the SVX) ?
- Irrelevant
Answers to those questions is of prime importance for the design of the output classes. Basically, if we are sure that everything will always fit in memory, I think a hit vector per detector (as we have now, but a PHObject-compliant one, hopefully) per event (containing all hits of that event) is fine. If not, well, we might have to consider keeping only (in memory) hits for a single primary (here I assume making a subevent = 1 primary is not a silly option), which probably means having an output tree where the assumption 1 entry = 1 event is no longer true (there will be N_i entries per event, where N_i is the number of primaries for i-th event) (or having 1 Tree per event ?). Well, don't know if that's clearly explained or not...
Well, maybe even before starting thinking on how to organize hits versus subevents in memory, we for sure need such a concept as a hit collection. By the way, this is not limited to the simulation part of PHENIX : if you replace hit by central track, you end up with a central track collection... but that's another story.
I have two schemes in mind to hold hits. Both are based on a generic templated base class and at least one concrete implementation, typically based on TClonesArray. Both are the equivalents of a simple vector, as I do not see where we would need something more evolved and less efficient like a map. Yes, I'm aware MUTOO is using a lot of maps, but look, those are already there, so there's nothing to code! The advantage of having a template base class is that we only have one place to maintain (as compared to the tens of container classes we currently have). Also, with only one class at work, we can let it be a little bit more complex and/or clever than the typical ones we do have currently.
- The first one is the "easy" one, i.e. more or less a generalisation of the kind of containers we currently use in offline (being PHCentralTrack or more like emcClusterContainer maybe) : those are containers holding bare pointers and returning those naked to the user. Let's call the primary template PHObjectVector
- The second one is more ambitious and more difficult to implement : here the idea would be to extend our containers to be STL compliant, which means we'd be able to use all the nice STL algorithms (sort, unique, partition, min_element...) on them. This requires that :
- the containers return plain objects (and not pointers), that can be safely copied (the STL algorithm need copying, though unneeded copying is always avoided). In turn, this means we need something like a PHObject smart pointer, i.e. a plain object containing a pointer.
- the containers can return iterators to themselves (almost all algorithms need a pair of iterator, typically from begin() to end() of the container)
Both are plain PHObjects (and thus offer the clone(), create(), identify(), isValid() and Reset() methods) and offer a common interface (plus some iterator related things in the second case). They are only templated wrt one BASE class (typically PisaXXXHit).
template<class BASE> class PHObjectXXXVector<BASE> : public PHObject { public: // Create a new element at pos index and return it. value_type add(const size_type& index); // Place an existing element at pos index and return it. value_type add(const size_type& index, const value_type& value); /// Get i-th element ZZZ get(const size_type& index) const; /// Resize void resize(const size_type& newsize); /// Get the vector's size size_type size() const; };
Where XXX is void (or Handle), value_type is BASE* (or PHObjectHandle<BASE>), and ZZZ is BASE* (or PHObjectHandle<BASE>&).
Whereas the base classes need only to know about the single object's base class, both implementations probably need to know about both the base and the concrete class. If I take the example of a PHObject-compliant BBC hit class (base class PisaBBCHit) it will have some versions (at least 1, e.g. PisaBBCHitv1), and a collection of such hits would be declared as:
typedef PHObjectXXXVectorImplementation<PisaBBCHit,PisaBBCHitv1> BBCHitVector; BBCHitVector bbchits;
where PHObjectXXXVectorImplementation derives from PHObjectXXXVector.
I've started to implement both options, and I'm currently testing them. But before I diverge and loose too much time, I'd like to hear if you think one (or both!) option is out of bounds, and why. Bear in mind that both are under heavy work, so are not fully functional yet...