Garbage Collection in Python: Deep Dive for Beginners & Intermediate Developers
Introduction
Hello Python friends! Memory management is one of those silent heroes that makes Python so beginner-friendly. You rarely have to worry about freeing memory manually — thanks to Python’s built-in Garbage Collector (GC).
Python uses two main techniques:
- Reference Counting – the primary and fastest mechanism
- Generational Garbage Collection – handles the tricky cases that reference counting can’t solve alone
In this article, we’ll explore how garbage collection works under the hood, why you might want to control it manually, and how to use the gc module effectively.
Importance of Manual Garbage Collection
Most of the time, you can happily ignore garbage collection and let Python do its job. However, there are situations where manual intervention becomes very useful:
- Long-running applications (web servers, data pipelines) that need to keep memory usage low
- Debugging memory leaks
- Applications dealing with large objects or cyclic references
- Performance tuning in memory-constrained environments
- Understanding why your program is consuming more RAM than expected
Manual control gives you the power to clean up at the right moment instead of waiting for the automatic collector.
Self-Referencing (Cyclic) Objects and Garbage Collection
Reference counting works beautifully until objects start referencing each other in a cycle. Let’s see a classic example:
class Node:
def __init__(self, name):
self.name = name
self.ref = None
a = Node("A")
b = Node("B")
a.ref = b
b.ref = a # Now we have a cycle!
del a
del b # Even after deleting, memory may not be freed immediately
Here, even though we deleted the variables a and b, the two objects still point to each other. Their reference count never reaches zero, so the reference counter can’t clean them up. This is exactly where Python’s generational garbage collector steps in.
The ‘gc’ Module – Your Tool for Manual Control
Python provides the gc module to interact with the garbage collector directly.
import gc
print(gc.isenabled()) # Usually returns True
gc.collect() Method
This is the most commonly used function. It forces the garbage collector to run immediately.
# Force a full garbage collection
collected = gc.collect()
print(f"Garbage collector collected {collected} objects")
You can also run collection on a specific generation:
gc.collect(0) # Generation 0 (youngest objects)
gc.collect(1) # Generation 1
gc.collect(2) # Generation 2 (oldest objects)
Thresholds – Controlling When GC Runs Automatically
Python divides objects into three generations (0, 1, 2). New objects start in generation 0. If they survive a collection, they move to the next generation.
You can check and modify the collection thresholds:
# Get current thresholds
print(gc.get_threshold())
# Default is usually (700, 10, 10)
# Set new thresholds
gc.set_threshold(1000, 15, 15)
# Get current count of objects in each generation
print(gc.get_count())
Lower thresholds = more frequent collections (higher CPU usage, lower memory). Higher thresholds = fewer collections (lower CPU, higher memory usage).
Disabling and Enabling Garbage Collection
You can temporarily turn off automatic garbage collection (useful in performance-critical sections):
gc.disable() # Turn off automatic GC
# ... do heavy work here ...
gc.enable() # Turn it back on
Warning: Only disable GC if you know what you’re doing and plan to call gc.collect() manually at safe points.
Case Studies
Case Study 1: Memory Leak in a Long-Running Web Server
A Flask/Django application slowly increases RAM usage over days. After analysis, you discover many cyclic references created by ORM objects and signal handlers.
Solution:
import gc
# At the end of each request or every few minutes
collected = gc.collect()
print(f"Cleaned {collected} unreachable objects")
Case Study 2: Processing Millions of Temporary Objects
You’re running a data processing script that creates thousands of large objects in a loop.
for batch in large_dataset:
process_batch(batch)
if batch_number % 50 == 0:
gc.collect() # Clean up after every 50 batches
print("Memory cleaned")
Case Study 3: Debugging with gc.get_objects()
When you suspect a memory leak, you can inspect all objects tracked by the GC:
import gc
all_objects = gc.get_objects()
print(len(all_objects)) # Total objects being tracked
# Find specific types
node_objects = [obj for obj in all_objects if isinstance(obj, Node)]
print(f"Number of Node objects still alive: {len(node_objects)}")
Best Practices for Python Developers
- Don’t call
gc.collect()too frequently — it has a cost. - Use
weakrefmodule to avoid creating unnecessary strong reference cycles. - Profile memory usage with tools like
memory_profiler,tracemalloc, orobjgraph. - In performance-critical loops, consider disabling GC temporarily.
- Always test memory behavior in production-like conditions.
Quick Summary
Python’s garbage collector is smart and usually does a great job. However, understanding how it works and knowing how to control it with the gc module gives you a powerful tool for writing efficient, memory-friendly applications.
Whether you’re building a high-traffic web service, processing huge datasets, or just want to write cleaner code — mastering garbage collection will make you a much stronger Python developer.
Now it’s your turn! Try creating a small cyclic reference example, play with gc.collect() and thresholds, and observe the difference in memory usage.
Happy coding and keep your memory clean! 🧹
