This is rather the inverse of What can you use Python generator functions for?: python generators, generator expressions, and the `itertools`

module are some of my favorite features of python these days. They’re especially useful when setting up chains of operations to perform on a big pile of data–I often use them when processing DSV files.

What is a good way to split a NumPy array randomly into training and testing/validation dataset? Something similar to the `cvpartition`

or `crossvalind`

functions in Matlab.

Which approach is better? Using a tuple, like:

After many attempts trying optimize code, it seems that one last resource would be to attempt to run the code below using multiple cores. I don’t know exactly how to convert/re-structure my code so that it can run much faster using multiple cores. I will appreciate if I could get guidance to achieve the end goal. The end goal is to be able to run this code as fast as possible for arrays A and B where each array holds about 700,000 elements. Here is the code using small arrays. The 700k element arrays are commented out.

I’d appreciate some help in finding and understanding a pythonic way to optimize the following array manipulations in nested for loops:

I am training a CNN with TensorFlow for medical images application.

What’s an efficient way, given a NumPy matrix (2D array), to return the minimum/maximum `n`

values (along with their indices) in the array?

The docs only say that Python interpreter performs “basic optimizations”, without going into any detail. Obviously, it’s implementation dependent, but is there any way to get a feel for what type of things could be optimized, and how much run-time savings it could generate?

I’m writing some moderately performance critical code in numpy.

This code will be in the inner most loop, of a computation that’s run time is measured in hours.

A quick calculation suggest that this code will be executed up something like 10^12 times, in some variations of the calculation.

I’m trying to fit the distribution of some experimental values with a custom probability density function. Obviously, the integral of the resulting function should always be equal to 1, but the results of simple scipy.optimize.curve_fit(function, dataBincenters, dataCounts) never satisfy this condition.

What is the best way to solve this problem?