Garbage Collection in Python: Deep Dive for Beginners & Intermediate Developers

Introduction

Hello Python friends! Memory management is one of those silent heroes that makes Python so beginner-friendly. You rarely have to worry about freeing memory manually — thanks to Python’s built-in Garbage Collector (GC).

Python uses two main techniques:

In this article, we’ll explore how garbage collection works under the hood, why you might want to control it manually, and how to use the gc module effectively.

Importance of Manual Garbage Collection

Most of the time, you can happily ignore garbage collection and let Python do its job. However, there are situations where manual intervention becomes very useful:

  • Long-running applications (web servers, data pipelines) that need to keep memory usage low
  • Debugging memory leaks
  • Applications dealing with large objects or cyclic references
  • Performance tuning in memory-constrained environments
  • Understanding why your program is consuming more RAM than expected

Manual control gives you the power to clean up at the right moment instead of waiting for the automatic collector.

Self-Referencing (Cyclic) Objects and Garbage Collection

Reference counting works beautifully until objects start referencing each other in a cycle. Let’s see a classic example:

class Node:
    def __init__(self, name):
        self.name = name
        self.ref = None

a = Node("A")
b = Node("B")
a.ref = b
b.ref = a   # Now we have a cycle!

del a
del b       # Even after deleting, memory may not be freed immediately

Here, even though we deleted the variables a and b, the two objects still point to each other. Their reference count never reaches zero, so the reference counter can’t clean them up. This is exactly where Python’s generational garbage collector steps in.

The ‘gc’ Module – Your Tool for Manual Control

Python provides the gc module to interact with the garbage collector directly.

import gc
print(gc.isenabled())   # Usually returns True

gc.collect() Method

This is the most commonly used function. It forces the garbage collector to run immediately.

# Force a full garbage collection
collected = gc.collect()
print(f"Garbage collector collected {collected} objects")

You can also run collection on a specific generation:

gc.collect(0)  # Generation 0 (youngest objects)
gc.collect(1)  # Generation 1
gc.collect(2)  # Generation 2 (oldest objects)

Thresholds – Controlling When GC Runs Automatically

Python divides objects into three generations (0, 1, 2). New objects start in generation 0. If they survive a collection, they move to the next generation.

You can check and modify the collection thresholds:

# Get current thresholds
print(gc.get_threshold())  
# Default is usually (700, 10, 10)

# Set new thresholds
gc.set_threshold(1000, 15, 15)

# Get current count of objects in each generation
print(gc.get_count())

Lower thresholds = more frequent collections (higher CPU usage, lower memory). Higher thresholds = fewer collections (lower CPU, higher memory usage).

Disabling and Enabling Garbage Collection

You can temporarily turn off automatic garbage collection (useful in performance-critical sections):

gc.disable()   # Turn off automatic GC

# ... do heavy work here ...

gc.enable()    # Turn it back on

Warning: Only disable GC if you know what you’re doing and plan to call gc.collect() manually at safe points.

Case Studies

Case Study 1: Memory Leak in a Long-Running Web Server

A Flask/Django application slowly increases RAM usage over days. After analysis, you discover many cyclic references created by ORM objects and signal handlers.

Solution:

import gc
# At the end of each request or every few minutes
collected = gc.collect()
print(f"Cleaned {collected} unreachable objects")

Case Study 2: Processing Millions of Temporary Objects

You’re running a data processing script that creates thousands of large objects in a loop.

for batch in large_dataset:
    process_batch(batch)
    if batch_number % 50 == 0:
        gc.collect()   # Clean up after every 50 batches
        print("Memory cleaned")

Case Study 3: Debugging with gc.get_objects()

When you suspect a memory leak, you can inspect all objects tracked by the GC:

import gc
all_objects = gc.get_objects()
print(len(all_objects))  # Total objects being tracked

# Find specific types
node_objects = [obj for obj in all_objects if isinstance(obj, Node)]
print(f"Number of Node objects still alive: {len(node_objects)}")

Best Practices for Python Developers

  • Don’t call gc.collect() too frequently — it has a cost.
  • Use weakref module to avoid creating unnecessary strong reference cycles.
  • Profile memory usage with tools like memory_profiler, tracemalloc, or objgraph.
  • In performance-critical loops, consider disabling GC temporarily.
  • Always test memory behavior in production-like conditions.

Quick Summary

Python’s garbage collector is smart and usually does a great job. However, understanding how it works and knowing how to control it with the gc module gives you a powerful tool for writing efficient, memory-friendly applications.

Whether you’re building a high-traffic web service, processing huge datasets, or just want to write cleaner code — mastering garbage collection will make you a much stronger Python developer.

Now it’s your turn! Try creating a small cyclic reference example, play with gc.collect() and thresholds, and observe the difference in memory usage.

Happy coding and keep your memory clean! 🧹

Next Post Previous Post