If you write Python code, there’s probably been a time or two when you saw the dreaded “MemoryError”. This happens after one of your Python scripts stops because your computer has no spare RAM to execute it. I recently experienced this frustration whilst trying to write hundreds of thousands of csv files. However, this time I grasped for tools that support smarter memory management. Now, I can watch my computer’s memory bounce around with the Windows Resource Monitor. Python has quite a few memory profiling libraries for monitoring memory too!
Python Libraries and Guides
Memory Management Overview, Python documentation
Memory Profiler: “monitor memory usage of Python code”
psutil: “Cross-platform lib for process and system monitoring in Python”
py-spy: “Sampling profiler for Python programs”
pyinstrument: “🚴 Call stack profiler for Python. Shows you why your code is slow!”
Scalene: “a high-performance, high-precision CPU, GPU, and memory profiler for Python”
Yappi: “Yet Another Python Profiler, but this time thread&coroutine&greenlet aware.”
line_profiler: “Line-by-line profiling for Python”
pprofile: “Line-granularity, thread-aware deterministic and statistic pure-python profiler”
Guppy 3: “Python programming environment and heap analysis toolset”
See also: The Python Profilers, Python documentation
CPython standard distribution comes with three deterministic profilers.Yappi Github, https://github.com/sumerc/yappi
cProfileis implemented as a C module based on
Profileis in pure Python and
hotshotcan be seen as a small subset of a cProfile.
Task Manager: Windows process management tool with some memory analytics
Resource Monitor: Windows tool with Memory, CPU, Disk and Network monitoring tabs
Memory Tips and Guides
- Memory Management in Python, Towards Data Science: this article shows some memory efficient ways to write your code.
- Use only the data you need. Any data you read in and aren’t using is held in memory. The usecols argument in pandas is a great way to read a csv and only use the columns you need.
- Reading data in chunks with the chunksize argument is another way to reduce memory usage for large datasets.
- Measuring the memory usage of a Pandas dataframe
- Some tools are line oriented, others are function oriented. If your code contains large functions, you might favor a line based profiling tool.
- Be aware of the overhead some memory tools may incur. The Scalene PyCon talk shows an awesome comparison of these Python profiling libraries:
- “profile, cProfile, and pstats – Performance analysis of Python programs.”
- “Profiling and Analyzing Performance of Python Programs“
- 20 Linux Memory Management Command Line Tools
- “Random-access Memory (RAM)“
- “Cache Memory“
When you’ll see “MemoryError” depends on your computer’s hardware, the size of your dataset and what operations you need to script out. Generally speaking, I/O or file reads and writes are more expensive operations.
The tools in this post will help you anticipate how much computing power you have available, monitor your memory consumption more closely and avoid pushing your computer past its limits. You can do things like reading data in chunks and only using the columns you need to reduce your memory consumption. Realizing these tools and strategies can make getting things done with Python a smoother ride. Cool runnings!