How can I efficiently process a numpy array in blocks similar to Matlab’s blkproc (blockproc) function

I’m looking for a good approach for efficiently dividing an image into small regions, processing each region separately, and then re-assembling the results from each process into a single processed image. Matlab had a tool for this called blkproc (replaced by blockproc in newer versions of Matlab).

How to get the cells of a sudoku grid with OpenCV?

I’ve been trying for the last few days to get a sudoku grid from a picture, and I have been struggling on getting the smaller squares of the grid.
I am working on the picture below. I thought processing the image with a canny filter would work fine, but it didn’t and I couldn’t get every contour of each square. I then put adaptive threshold, otsu, and a classic thresholding to the test, but every time, it just could not seem to capture every small square.