scikit-learn Archives - Page 2 of 5

How to get most informative features for scikit-learn classifiers?

August 18, 2022 by Magenaut

The classifiers in machine learning packages like liblinear and nltk offer a method show_most_informative_features(), which is really helpful for debugging features:

scikit-learn & statsmodels – which R-squared is correct?

August 18, 2022 by Magenaut

I’d like to choose the best algorithm for future. I found some solutions, but I didn’t understand which R-Squared value is correct.

where to put freeze_support() in a Python script?

August 18, 2022 by Magenaut

I am confused about using freeze_support() for multiprocessing and I get a Runtime Error without it. I am only running a script, not defining a function or a module. Can I still use it? Or should the packages I’m importing be using it?

How to normalize a NumPy array to a unit vector?

August 17, 2022 by Magenaut

I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this normalisation function:

ImportError in importing from sklearn: cannot import name check_build

August 17, 2022 by Magenaut

I am getting the following error while trying to import from sklearn:

Principal Component Analysis (PCA) in Python

August 17, 2022 by Magenaut

I have a (26424 x 144) array and I want to perform PCA over it using Python. However, there is no particular place on the web that explains about how to achieve this task (There are some sites which just do PCA according to their own – there is no generalized way of doing so that I can find). Anybody with any sort of help will do great.

Random state (Pseudo-random number) in Scikit learn

August 17, 2022 by Magenaut

I want to implement a machine learning algorithm in scikit learn, but I don’t understand what this parameter random_state does? Why should I use it?

what is the difference between ‘transform’ and ‘fit_transform’ in sklearn

August 16, 2022 by Magenaut

In the sklearn-python toolbox, there are two functions transform and fit_transform about sklearn.decomposition.RandomizedPCA. The description of two functions are as follows

scikit-learn DBSCAN memory usage

August 16, 2022 by Magenaut

UPDATED: In the end, the solution I opted to use for clustering my large dataset was one suggested by Anony-Mousse below. That is, using ELKI’s DBSCAN implimentation to do my clustering rather than scikit-learn’s. It can be run from the command line and with proper indexing, performs this task within a few hours. Use the GUI and small sample datasets to work out the options you want to use and then go to town. Worth looking into. Anywho, read on for a description of my original problem and some interesting discussion.

Stratified Train/Test-split in scikit-learn

August 15, 2022 by Magenaut

I need to split my data into a training set (75%) and test set (25%). I currently do that with the code below: