Dear TBB experts !
I would like to know what is the best solution to the following simple problem. I want to perfom a parallel_for with a work functor that requires some workspace (e.g. an array that should not be accessed concurently.
1) The simplest solution is to use an automatic vector inside the () operator of my functor
struct WorkFunctor{
void operator()(block_range..){
std::vector<double> workspace(2048);
... do some work
}
};
Unfortunately, for small granularity, this solution is slow because of the time consumed for allocating/deallocating the workspace array.
2) I may improve the situation be using tbb scalable allocator
struct WorkFunctor{
void operator()(block_range..){
std::vector<double,tbb::cache_aligned_allocator<double> > workspace(2048);
... do some work
}
};
3) I improve a bit the perf by using static size array, be I have to be very carefull with the
stack size per thread (I have encountered erratic bugs due to this issue).
struct WorkFunctor{
void operator()(block_range..){
double workspace[2048];
... do some work
}
};
4) I wonder if the use of thread specific storage is the appropriate solution (tbb::enumerable_thread_specific). I found very few examples for this.
Thank you in advance for your help.