When are objects garbage collected in python? When is the memory released and does the collection impact performance? Can one opt out or tune the gc algorithm and if so how?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
When are objects garbage collected in python?
There is a lot of detail in the source code for CPython: http://svn.python.org/view/python/trunk/Modules/gcmodule.c?revision=81029&view=markup
Any time a reference count drops to zero, the object is immediately removed.
293 /* Python’s cyclic gc should never see an incoming refcount
294 * of 0: if something decref’ed to 0, it should have been
295 * deallocated immediately at that time.
A full collection is triggered when the number of new objects is greater than 25% of the number of existing objects.
87 In addition to the various configurable thresholds, we only trigger a
88 full collection if the ratio
89 long_lived_pending / long_lived_total
90 is above a given value (hardwired to 25%).
When is the memory released?
I was only able to fish out this information.
781 /* Clear all free lists
782 * All free lists are cleared during the collection of the highest
generation.783 * Allocated items in the free list may keep a pymalloc arena
occupied.784 * Clearing the free lists may give back memory to the OS earlier.
785 */
According to this, Python may be keeping your object in a free list for recycling even if you drop its refcount to zero. I am unable to explicitly find when the free call is made to give memory back to the operating system, but I imagine that this is done whenever a collection is made and the object is not being kept in a free list.
Does the collection impact performance?
Any non-trivial garbage collector I have heard of requires both CPU and memory to operate. Therefore, yes, there is always an impact on performance. You’ll have to experiment and get to know your garbage collector.
Programs that require real time responsiveness I have run into issues with, since garbage collectors don’t grant me control over when they run or for how long they do. Some peculiar cases can cause excessive memory use as well, an example being Python’s knack for keeping free lists.
Method 2
Here is an excerpt from the language reference
Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.
CPython implementation detail: CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the documentation of the gc module for information on controlling the collection of cyclic garbage. Other implementations act differently and CPython may change. Do not depend on immediate finalization of objects when they become unreachable (ex: always close files).
EDIT: About postponing garbage collection …. the gc module allows you to interact with the garbage collector, and disable it if you want to and change collection frequency etc. But I have not used it myself. Also, cycles that contain any objects with __del__ methods are not collected.
Method 3
To expand on the previous answers with some more numbers and actionable information:
You can use gc.set_threshold(threshold0[, threshold1[, threshold2]]) to tune when automatic garbage collection kicks in:
The GC classifies objects into three generations depending on how many
collection sweeps they have survived. New objects are placed in the
youngest generation (generation 0). If an object survives a collection
it is moved into the next older generation. Since generation 2 is the
oldest generation, objects in that generation remain there after a
collection. In order to decide when to run, the collector keeps track
of the number object allocations and deallocations since the last
collection. When the number of allocations minus the number of
deallocations exceeds threshold0, collection starts. Initially only
generation 0 is examined. If generation 0 has been examined more than
threshold1 times since generation 1 has been examined, then generation
1 is examined as well. With the third generation, things are a bit
more complicated, see Collecting the oldest generation for more
information.
While I could not find the default thresholds in the documentation, looking through the implementation, the default values for the thresholds seem to be (CPython 3.9.1) :
threshold0: 700threshold1: 10threshold2: 10
I.e. by default, automatic garbage collection should set in once the number of allocations minus the number of deallocations exceeds 700.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0