I am doing a machine learning project in Python, so I have to do parallel predict function, which I’m using in my program.
from multiprocessing.dummy import Pool
from multiprocessing import cpu_count
def multi_predict(X, predict, *args, **kwargs):
pool = Pool(cpu_count())
results = pool.map(predict, X)
pool.close()
pool.join()
return results
The problem is that all my CPUs loaded only on 20-40% (in sum it’s 100%). I use multiprocessing.dummy because I have some problems with multiprocessing module in pickling function.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
When you use multiprocessing.dummy, you’re using threads, not processes:
multiprocessing.dummyreplicates the API ofmultiprocessingbut is no
more than a wrapper around thethreadingmodule.
That means you’re restricted by the Global Interpreter Lock (GIL), and only one thread can actually execute CPU-bound operations at a time. That’s going to keep you from fully utilizing your CPUs. If you want get full parallelism across all available cores, you’re going to need to address the pickling issue you’re hitting with multiprocessing.Pool.
Note that multiprocessing.dummy might still be useful if the work you need to parallelize is IO bound, or utilizes a C-extension that releases the GIL. For pure Python code, however, you’ll need multiprocessing.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0