Number of files per directory

I have a directory with about 100000 small files (each file is from 1-3 lines, each file is a text file). In size the directory isn’t very big (< 2GB). This data lives in a professionally administered NFS server. The server runs Linux. I think the filesystem is ext3, but I don’t know for sure. Also, I don’t have root access to the server.

These files are the output of a large scale scientific experiment, over which I don’t have control. However, I have to analyze the results.

Any I/O operation/processing in this directory is very, very slow. Opening a file (fopen in python), reading from an open file, closing a file, are all very slow. In bash ls, du, etc. don’t work.

The question is:

What is the maximum number of files in a directory in Linux in such a way that it is practical to do processing, fopen, read, etc? I understand that the answer depends on many things: fs type, kernel version, server version, hardware, etc. I just want a rule of thumb, if possible.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

As you surmise, it does depend on many things, mostly the filesystem type and options and to some extent the kernel version. In the ext2/ext3/ext4 series, there was a major improvement when the dir_index option appeared (some time after the initial release of ext3): it makes directories be stored as search trees (logarithmic time access) rather than linear lists (linear time access). This isn’t something you can see over NFS, but if you have some contact with the admins you can ask them to run tune2fs -l /dev/something |grep features (perhaps even convince them to upgrade?). Only the number of files matters, not their size.

Even with dir_index, 100000 feels large. Ideally, get the authors of the program that creates the files to add a level of subdirectories. For no performance degradation, I would recommend a limit of about 1000 files per directory for ext2 or ext3 without dir_index and 20000 with dir_index or reiserfs. If you can’t control how the files are created, move them into separate directories before doing anything else.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments