The memory management is one of the most important interview topics for Python developers.
Some common questions are:
How to get the memory address of a Python object?
How garbage collection implemented in Python?
How does Python optimize memory usage? (What is interning mechanism?)
How to get the memory address of a Python object?
In CPython, we can use the built-in id() function to get the memory address of an object, and we can convert the address into hexadecimal format using hex()
>>> num1 = 10
>>> id(num1)
140731982074952
>>> hex(id(num1))
'0x7ffeb7ccd448'
How to dereference the address?
>>> import _ctypes
>>> print(_ctypes.PyObj_FromPtr(140731982074952))
10
As shown above, we can use the PyObj_FromPtr() function, which is provided by the built-in _ctypes module, to get the value of an object by its memory address.
How Does Python Optimize Memory Usage (Interning Mechanism)?
Python has a special feature called integer caching, or interning mechanism to optimize memory usages. If we don't understand it, we will be confused about some python code:
code snippet #1
>>> a=256
>>> b=256
>>> hex(id(a))
'0x7ffeb7ccf308'
>>> hex(id(b))
'0x7ffeb7ccf308'
code snippet #2
>>> c=257
>>> d=257
>>> hex(id(c))
'0x1d30ca5f210'
>>> hex(id(d))
'0x1d30ca5e910'
The results of the above code are weird.
(1) When we compare the addresses of 'a' and 'b' both are referencing same address.
(2) When we compare the addresses of 'c' and 'd' both are referencing unique addresses.
Why because python uses interning mechanisms to optimize its memory usage.
To save time and memory costs, python always pre-loads all the small integers in the range of [-5 to 256]. When a new integer variable in this range is declared, Python just references the cached integer to it and won't create any new object.
The Above Results Are Different if the Python Compiler Can See the Whole Picture
If we run the same code as a Python script, the 'a' and 'b' will refer to the same address.
Python compiler is very smart and will do many optimizations for us and under the hood. If we run our code as a whole script, the python compiler can "see" the whole program at once and do the corresponding optimizations.
However, if we run the code as line by line in the interactive python shell, the compiler can only see one line of code each time.
Python has many optimization mechanisms internally; we should understand how it works and affects our code.
We know how integer caching feature is useful and necessary to reduce time and memory costs. Now we must understand, How Caching feature was implemented for mutable and immutable types also.
In the following code 'list1' and 'list2' both are not referencing the same address.
and 'tup1' and 'tup2' both are referencing the same address.
How Does Python Collect Garbage (objects are not having any reference)?
.NET, JAVA and some other programming languages are implementing garbage collection, and it is necessary. Cause memory spaces are not un-limited, if some objects shouldn't be stored in the memory anymore, there must be some ways to remove them.
Fortunately, Python has an automatic garbage collection mechanism. So, we don't need to worry about it when building our software.
Python uses a method called reference counting to decide when an object needs to be collected in the memory.
following code 'a', and 'b' are both referencing same object, then count will be 2
>>> a = 10 (ref count: 1)
>>> b = a (ref count: 2)
>>> a
10
>>> b
10
>>> hex(id(a))
'0x7ffeb7ccd448'
>>> hex(id(b))
'0x7ffeb7ccd448'
If I'm deleting one object, count will become '1', whenever the reference count is '0'
then object will be cleaned by the garbage collector.
>>> del a
>>> b
10
>>> a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
How to get object reference count in python?
sys module contains a function called, getrefcount() returns the reference count of the object.
syntax:
sys.getrefcount(object), since this method takes object ref as argument, it increases the count 1.
Comments