Cool new features in 3.11.
Performance
1.2x faster generally, thanks to an adaptive interpreter (PEP659) that optimises byte-code based on observed behaviour and usage.
Take for example the LOAD_ATTR
instruction, which under 3.11 can be replaced by LOAD_ATTR_ADAPTIVE
.
This will replace the call to the most optimised instruction based on what is being done, such as:
LOAD_ATTR_INSTANCE_VALUE
LOAD_ATTR_MODULE
LOAD_ATTR_SLOT
Disassembling some code:
def feet_to_meters(feet):
return 0.3048 * feet
for feet in (1.0, 10.0, 100.0, 1000.0, 2000.0, 3000.0, 4000.0):
print(f"{feet:7.1f} feet = {feet_to_meters(feet):7.1f} meters")
import dis
dis.dis(feet_to_meters, adaptive=True)
# 1 0 RESUME 0
#
# 2 2 LOAD_CONST 1 (0.3048)
# 4 LOAD_FAST 0 (feet)
# 6 BINARY_OP 5 (*)
# 10 RETURN_VALUE
However, when the interpreter is given more concrete to work with its able to optimise. For example, outside the loop context when given a float, floating point instructions are put to work:
print(f"7000.0 feet = {feet_to_meters(7000.0):7.1f} meters")
dis.dis(feet_to_meters, adaptive=True)
# 1 0 RESUME_QUICK 0
#
# 2 2 LOAD_CONST__LOAD_FAST 1 (0.3048)
# 4 LOAD_FAST 0 (feet)
# 6 BINARY_OP_MULTIPLY_FLOAT 5 (*)
# 10 RETURN_VALUE
This adaptive behaviour will take affect at runtime. In the above example, the 8th call to feet_to_meters
is the trigger point for the floating point optimisation.
Zero cost exceptions
try
blocks are now cheaper in the case no exception occurs.
The compiler now builds a jump table of memory locations to jump to in the case of exceptions, as opposed to explicitly loading the stack up with these (resulting in work for each try/except). Without this exception handling bloat, frame objects can now be reduced by 240 bytes per function.
Faster startup with pycache
As the step to generate bytecode is costly, python caches it to a __pycache__
directory.
At program start-up:
- Read cache
- Unmarshall objects
- Allocate memory on heap
- Evaluate the code
Python 3.11 optimises this workflow by freezing key core modules, allowing the static allocation of these module directly into memory (effectively combining steps 1-3).
The result, a 10-15% faster bootstrapping.
Other performance goodness
- Function frame creation process
- Recursive calls are faster