How to use global variables correctly in Python
I just accomplish an online algorithm test for an interview, one of the problems ask me to travel thought a tree structure and calculate how many paths satisfied the condition. It can be easily solved by traditional DFS algorithm. I answered this (pseudocode):
counter = 0
def dfs(root, current_state):
# We should not use global variale in production
# only use it at the interview
global counter
if current_state satisified condisition:
counter += 1
for node in root.children:
new_state = current_state + something
dfs(node, new_state)
def main(root):
dfs(root, init_state)
return counter
The code works well and passes all the test cases. Sadly, the reviews form the dev team didn’t like my solution since I use a global variable here. I know that global variable is evil and I never use it in practice (that is why I left some comments in the code). I will likely agree with them that global, eval()
may likely lead to bad code. There are multiple ways to avoid using the global variable, the first solution is to use class or nested function (pseudocode):
class Solution:
def main(root):
self.counter = 0
dfs(root, init_state)
return self.counter
def dfs(root, current_state):
if current_state match satisified:
self.counter += 1
for node in root.children:
new_state = current_state + something
dfs(node, new_state)
Another solution is to use a stack for iterable instead of recursion.
def main(root):
counter = 0
stack = [root, init_state]
while stack:
root, current_state = stack.pop()
if current_state match satisified:
count += 1
for node in root.children:
new_state = current_state + something
stack.append(node, new_state)
return counter
We know how to avoid using it. However, should we never use global or eval()
in production? I know that it’s not true. If we want to know how to avoiding using them we should also learn how to use them correctly. Understand your enemy, understand him well, right?
What is global variable in Python
If you come from C/C++, the global variable in Python like static variable instead of external variable. It’s only visible inside a py file (we call it module in python). Here is an example of how to use it.
foo = 100
def bar():
# foo here is global variable
# even thought we didn't use global keyword
if foo == 100:
# the function will return True
return True
return False
foo = 100
def bar():
# foo become local variable
foo = 10
if foo == 100:
return True
# the function will return False
return False
foo = 100
def bar():
# we can't modify the global variable inside a function without 'global' keyword
# so it will raise an error
foo += 10
if foo == 100:
return True
return False
foo = 100
def bar():
# Now it works
global foo
foo += 10
if foo == 100:
return True
# the function will return False
return False
As you can see, the global keyword itself help developers to know which variable is global variable explicitly.
When not to use it
Thread safety and Unit tests can be messed up by using the global variable because lots of functions can modify it. There is a lot of articles tell you why not to use a global variable, like Global Variables Are Bad and Why are global variables evil?.
When to use it
I dug into the CPython source code to see how to use it correctly. There is some situations we can use it.
1. Cache
In zipfile.py/_ZipDecrypter. we use global varialbe _crctable
so we don’t have to call the time consuming _gen_crc function
repeatly.
_crctable = None
def _gen_crc(crc):
for j in range(8):
if crc & 1:
crc = (crc >> 1) ^ 0xEDB88320
else:
crc >>= 1
return crc
def _ZipDecrypter(pwd):
key0 = 305419896
key1 = 591751049
key2 = 878082192
global _crctable
if _crctable is None:
_crctable = list(map(_gen_crc, range(256)))
crctable = _crctable
2. Global State
In shutil.py/_USE_CP_SENDFILE, We use _USE_CP_SENDFILE
to remember if the current variable has sendfile()
attr or not.
_USE_CP_SENDFILE = hasattr(os, "sendfile") and sys.platform.startswith("linux")
def copyfile(src, dst, *, follow_symlinks=True):
...
if err.errno == errno.ENOTSOCK:
# sendfile() on this platform (probably Linux < 2.6.33)
# does not support copies between regular files (only
# sockets).
_USE_CP_SENDFILE = False
raise _GiveupOnFastCopy(err)
3. Global data
If we want to share data that should be known by default. In mimetypes.py/_default_mime_types, we use suffix_map
to store default suffix of mime_types.
def _default_mime_types():
global suffix_map, _suffix_map_default
global encodings_map, _encodings_map_default
global types_map, _types_map_default
global common_types, _common_types_default
suffix_map = _suffix_map_default = {
'.svgz': '.svg.gz',
'.tgz': '.tar.gz',
'.taz': '.tar.gz',
'.tz': '.tar.gz',
'.tbz2': '.tar.bz2',
'.txz': '.tar.xz',
}
4. Initialization
Just to be clear, most of the time there should be only one function can update the value of the global variable, and other functions just refer it. In gettext/textdomain , only textdomain function can modify _current_domain
.
def textdomain(domain=None):
global _current_domain
if domain is not None:
_current_domain = domain
return _current_domain
Summary
I think global variable is not that evil when you understand how to use it correctly. :D