Warning, /frameworks/syndication/autotests/atom/bug411626.xml.expected is written in an unsupported language. File is not indexed.

0001 # Feed begin ######################
0002 title: #Real Python#
0003 link: #https://realpython.com/#
0004 # Person begin ####################
0005 name: #Real Python#
0006 # Person end ######################
0007 # Item begin ######################
0008 id: #https://realpython.com/python-kwargs-and-args/#
0009 title: #Python args and kwargs: Demystified#
0010 link: #https://realpython.com/python-kwargs-and-args/#
0011 description: #In this step-by-step tutorial, you'll learn how to use args and kwargs in Python to add more flexibility to your functions. You'll also take a closer look at the single and double-asterisk unpacking operators, which you can use to unpack any iterable object in Python.#
0012 content: #<p>Sometimes, when you look at a function definition in Python, you might see that it takes two strange arguments: <strong><code>*args</code></strong> and <strong><code>**kwargs</code></strong>. If you&rsquo;ve ever wondered what these peculiar variables are, or why your IDE defines them in <code>main()</code>, then this article is for you. You&rsquo;ll learn how to use args and kwargs in Python to add more flexibility to your functions.</p>
0013 <p><strong>By the end of the article, you&rsquo;ll know:</strong></p>
0014 <ul>
0015 <li>What <code>*args</code> and <code>**kwargs</code> actually mean</li>
0016 <li>How to use <code>*args</code> and <code>**kwargs</code> in function definitions</li>
0017 <li>How to use a single asterisk (<code>*</code>) to unpack iterables</li>
0018 <li>How to use two asterisks (<code>**</code>) to unpack dictionaries</li>
0019 </ul>
0020 <p>This article assumes that you already know how to define Python functions and work with <a href="https://realpython.com/lessons/mutable-data-structures-lists-dictionaries/">lists and dictionaries</a>.</p>
0021 <div class="alert alert-warning" role="alert"><p><strong>Free Bonus:</strong> <a href="" class="alert-link" data-toggle="modal" data-target="#modal-python-cheat-sheet-shortened" data-focus="false">Click here to get a Python Cheat Sheet</a> and learn the basics of Python 3, like working with data types, dictionaries, lists, and Python functions.</p></div>
0022 
0023 <h2 id="passing-multiple-arguments-to-a-function">Passing Multiple Arguments to a Function</h2>
0024 <p><strong><code>*args</code></strong> and <strong><code>**kwargs</code></strong> allow you to pass multiple arguments or keyword arguments to a function. Consider the following example. This is a simple function that takes two arguments and returns their sum:</p>
0025 <div class="highlight python"><pre><span></span><span class="k">def</span> <span class="nf">my_sum</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
0026     <span class="k">return</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
0027 </pre></div>
0028 
0029 <p>This function works fine, but it&rsquo;s limited to only two arguments. What if you need to sum a varying number of arguments, where the specific number of arguments passed is only determined at runtime? Wouldn&rsquo;t it be great to create a function that could sum <em>all</em> the integers passed to it, no matter how many there are?</p>
0030 <h2 id="using-the-python-args-variable-in-function-definitions">Using the Python args Variable in Function Definitions</h2>
0031 <p>There are a few ways you can pass a varying number of arguments to a function. The first way is often the most intuitive for people that have experience with collections. You simply pass a list or a <a href="https://realpython.com/python-sets/">set</a> of all the arguments to your function. So for <code>my_sum()</code>, you could pass a list of all the integers you need to add:</p>
0032 <div class="highlight python"><pre><span></span><span class="c1"># sum_integers_list.py</span>
0033 <span class="k">def</span> <span class="nf">my_sum</span><span class="p">(</span><span class="n">my_integers</span><span class="p">):</span>
0034     <span class="n">result</span> <span class="o">=</span> <span class="mi">0</span>
0035     <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">my_integers</span><span class="p">:</span>
0036         <span class="n">result</span> <span class="o">+=</span> <span class="n">x</span>
0037     <span class="k">return</span> <span class="n">result</span>
0038 
0039 <span class="n">list_of_integers</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
0040 <span class="nb">print</span><span class="p">(</span><span class="n">my_sum</span><span class="p">(</span><span class="n">list_of_integers</span><span class="p">))</span>
0041 </pre></div>
0042 
0043 <p>This implementation works, but whenever you call this function you&rsquo;ll also need to create a list of arguments to pass to it. This can be inconvenient, especially if you don&rsquo;t know up front all the values that should go into the list.</p>
0044 <p>This is where <code>*args</code> can be really useful, because it allows you to pass a varying number of positional arguments. Take the following example:</p>
0045 <div class="highlight python"><pre><span></span><span class="c1"># sum_integers_args.py</span>
0046 <span class="k">def</span> <span class="nf">my_sum</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">):</span>
0047     <span class="n">result</span> <span class="o">=</span> <span class="mi">0</span>
0048     <span class="c1"># Iterating over the Python args tuple</span>
0049     <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">args</span><span class="p">:</span>
0050         <span class="n">result</span> <span class="o">+=</span> <span class="n">x</span>
0051     <span class="k">return</span> <span class="n">result</span>
0052 
0053 <span class="nb">print</span><span class="p">(</span><span class="n">my_sum</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span>
0054 </pre></div>
0055 
0056 <p>In this example, you&rsquo;re no longer passing a list to <code>my_sum()</code>. Instead, you&rsquo;re passing three different positional arguments. <code>my_sum()</code> takes all the parameters that are provided in the input and packs them all into a single iterable object named <code>args</code>.</p>
0057 <p>Note that <strong><code>args</code> is just a name.</strong> You&rsquo;re not required to use the name <code>args</code>. You can choose any name that you prefer, such as <code>integers</code>:</p>
0058 <div class="highlight python"><pre><span></span><span class="c1"># sum_integers_args_2.py</span>
0059 <span class="k">def</span> <span class="nf">my_sum</span><span class="p">(</span><span class="o">*</span><span class="n">integers</span><span class="p">):</span>
0060     <span class="n">result</span> <span class="o">=</span> <span class="mi">0</span>
0061     <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">integers</span><span class="p">:</span>
0062         <span class="n">result</span> <span class="o">+=</span> <span class="n">x</span>
0063     <span class="k">return</span> <span class="n">result</span>
0064 
0065 <span class="nb">print</span><span class="p">(</span><span class="n">my_sum</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span>
0066 </pre></div>
0067 
0068 <p>The function still works, even if you pass the iterable object as <code>integers</code> instead of <code>args</code>. All that matters here is that you use the <strong>unpacking operator</strong> (<code>*</code>).</p>
0069 <p>Bear in mind that the iterable object you&rsquo;ll get using the unpacking operator <code>*</code> is <a href="https://realpython.com/python-lists-tuples/">not a <code>list</code> but a <code>tuple</code></a>. A <code>tuple</code> is similar to a <code>list</code> in that they both support slicing and iteration. However, tuples are very different in at least one aspect: lists are <a href="https://realpython.com/courses/immutability-python/">mutable</a>, while tuples are not. To test this, run the following code. This script tries to change a value of a list:</p>
0070 <div class="highlight python"><pre><span></span><span class="c1"># change_list.py</span>
0071 <span class="n">my_list</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
0072 <span class="n">my_list</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">9</span>
0073 <span class="nb">print</span><span class="p">(</span><span class="n">my_list</span><span class="p">)</span>
0074 </pre></div>
0075 
0076 <p>The value located at the very first index of the list should be updated to <code>9</code>. If you execute this script, you will see that the list indeed gets modified:</p>
0077 <div class="highlight sh"><pre><span></span><span class="gp">$</span> python change_list.py
0078 <span class="go">[9, 2, 3]</span>
0079 </pre></div>
0080 
0081 <p>The first value is no longer <code>0</code>, but the updated value <code>9</code>. Now, try to do the same with a tuple:</p>
0082 <div class="highlight python"><pre><span></span><span class="c1"># change_tuple.py</span>
0083 <span class="n">my_tuple</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
0084 <span class="n">my_tuple</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">9</span>
0085 <span class="nb">print</span><span class="p">(</span><span class="n">my_tuple</span><span class="p">)</span>
0086 </pre></div>
0087 
0088 <p>Here, you see the same values, except they&rsquo;re held together as a tuple. If you try to execute this script, you will see that the Python interpreter returns an <a href="https://realpython.com/python-exceptions/">error</a>:</p>
0089 <div class="highlight sh"><pre><span></span><span class="gp">$</span> python change_tuple.py
0090 <span class="go">Traceback (most recent call last):</span>
0091 <span class="go">  File &quot;change_tuple.py&quot;, line 3, in &lt;module&gt;</span>
0092 <span class="go">    my_tuple[0] = 9</span>
0093 <span class="go">TypeError: &#39;tuple&#39; object does not support item assignment</span>
0094 </pre></div>
0095 
0096 <p>This is because a tuple is an immutable object, and its values cannot be changed after assignment. Keep this in mind when you&rsquo;re working with tuples and <code>*args</code>.</p>
0097 <h2 id="using-the-python-kwargs-variable-in-function-definitions">Using the Python kwargs Variable in Function Definitions</h2>
0098 <p>Okay, now you&rsquo;ve understood what <code>*args</code> is for, but what about <code>**kwargs</code>? <code>**kwargs</code> works just like <code>*args</code>, but instead of accepting positional arguments it accepts keyword (or <strong>named</strong>) arguments. Take the following example:</p>
0099 <div class="highlight python"><pre><span></span><span class="c1"># concatenate.py</span>
0100 <span class="k">def</span> <span class="nf">concatenate</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
0101     <span class="n">result</span> <span class="o">=</span> <span class="s2">&quot;&quot;</span>
0102     <span class="c1"># Iterating over the Python kwargs dictionary</span>
0103     <span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">kwargs</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
0104         <span class="n">result</span> <span class="o">+=</span> <span class="n">arg</span>
0105     <span class="k">return</span> <span class="n">result</span>
0106 
0107 <span class="nb">print</span><span class="p">(</span><span class="n">concatenate</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="s2">&quot;Real&quot;</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="s2">&quot;Python&quot;</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="s2">&quot;Is&quot;</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="s2">&quot;Great&quot;</span><span class="p">,</span> <span class="n">e</span><span class="o">=</span><span class="s2">&quot;!&quot;</span><span class="p">))</span>
0108 </pre></div>
0109 
0110 <p>When you execute the script above, <code>concatenate()</code> will iterate through the Python kwargs <a href="https://realpython.com/python-dicts/">dictionary</a> and concatenate all the values it finds:</p>
0111 <div class="highlight sh"><pre><span></span><span class="gp">$</span> python concatenate.py
0112 <span class="go">RealPythonIsGreat!</span>
0113 </pre></div>
0114 
0115 <p>Like <code>args</code>, <code>kwargs</code> is just a name that can be changed to whatever you want. Again, what is important here is the use of the <strong>unpacking operator</strong> (<code>**</code>).</p>
0116 <p>So, the previous example could be written like this:</p>
0117 <div class="highlight python"><pre><span></span><span class="c1"># concatenate_2.py</span>
0118 <span class="k">def</span> <span class="nf">concatenate</span><span class="p">(</span><span class="o">**</span><span class="n">words</span><span class="p">):</span>
0119     <span class="n">result</span> <span class="o">=</span> <span class="s2">&quot;&quot;</span>
0120     <span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">words</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
0121         <span class="n">result</span> <span class="o">+=</span> <span class="n">arg</span>
0122     <span class="k">return</span> <span class="n">result</span>
0123 
0124 <span class="nb">print</span><span class="p">(</span><span class="n">concatenate</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="s2">&quot;Real&quot;</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="s2">&quot;Python&quot;</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="s2">&quot;Is&quot;</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="s2">&quot;Great&quot;</span><span class="p">,</span> <span class="n">e</span><span class="o">=</span><span class="s2">&quot;!&quot;</span><span class="p">))</span>
0125 </pre></div>
0126 
0127 <p>Note that in the example above the iterable object is a standard <code>dict</code>. If you <a href="https://realpython.com/iterate-through-dictionary-python/">iterate over the dictionary</a> and want to return its values, like in the example shown, then you must use <code>.values()</code>.</p>
0128 <p>In fact, if you forget to use this method, you will find yourself iterating through the <strong>keys</strong> of your Python kwargs dictionary instead, like in the following example:</p>
0129 <div class="highlight python"><pre><span></span><span class="c1"># concatenate_keys.py</span>
0130 <span class="k">def</span> <span class="nf">concatenate</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
0131     <span class="n">result</span> <span class="o">=</span> <span class="s2">&quot;&quot;</span>
0132     <span class="c1"># Iterating over the keys of the Python kwargs dictionary</span>
0133     <span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">kwargs</span><span class="p">:</span>
0134         <span class="n">result</span> <span class="o">+=</span> <span class="n">arg</span>
0135     <span class="k">return</span> <span class="n">result</span>
0136 
0137 <span class="nb">print</span><span class="p">(</span><span class="n">concatenate</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="s2">&quot;Real&quot;</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="s2">&quot;Python&quot;</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="s2">&quot;Is&quot;</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="s2">&quot;Great&quot;</span><span class="p">,</span> <span class="n">e</span><span class="o">=</span><span class="s2">&quot;!&quot;</span><span class="p">))</span>
0138 </pre></div>
0139 
0140 <p>Now, if you try to execute this example, you&rsquo;ll notice the following output:</p>
0141 <div class="highlight sh"><pre><span></span><span class="gp">$</span> python concatenate_keys.py
0142 <span class="go">abcde</span>
0143 </pre></div>
0144 
0145 <p>As you can see, if you don&rsquo;t specify <code>.values()</code>, your function will iterate over the keys of your Python kwargs dictionary, returning the wrong result.</p>
0146 <h2 id="ordering-arguments-in-a-function">Ordering Arguments in a Function</h2>
0147 <p>Now that you have learned what <code>*args</code> and <code>**kwargs</code> are for, you are ready to start writing functions that take a varying number of input arguments. But what if you want to create a function that takes a changeable number of both positional <em>and</em> named arguments?</p>
0148 <p>In this case, you have to bear in mind that <strong>order counts</strong>. Just as non-default arguments have to precede default arguments, so <code>*args</code> must come before <code>**kwargs</code>.</p>
0149 <p>To recap, the correct order for your parameters is:</p>
0150 <ol>
0151 <li>Standard arguments</li>
0152 <li><code>*args</code> arguments</li>
0153 <li><code>**kwargs</code> arguments</li>
0154 </ol>
0155 <p>For example, this function definition is correct:</p>
0156 <div class="highlight python"><pre><span></span><span class="c1"># correct_function_definition.py</span>
0157 <span class="k">def</span> <span class="nf">my_function</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
0158     <span class="k">pass</span>
0159 </pre></div>
0160 
0161 <p>The <code>*args</code> variable is appropriately listed before <code>**kwargs</code>. But what if you try to modify the order of the arguments? For example, consider the following function:</p>
0162 <div class="highlight python"><pre><span></span><span class="c1"># wrong_function_definition.py</span>
0163 <span class="k">def</span> <span class="nf">my_function</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
0164     <span class="k">pass</span>
0165 </pre></div>
0166 
0167 <p>Now, <code>**kwargs</code> comes before <code>*args</code> in the function definition. If you try to run this example, you&rsquo;ll receive an error from the interpreter:</p>
0168 <div class="highlight sh"><pre><span></span><span class="gp">$</span> python wrong_function_definition.py
0169 <span class="go">  File &quot;wrong_function_definition.py&quot;, line 2</span>
0170 <span class="go">    def my_function(a, b, **kwargs, *args):</span>
0171 <span class="go">                                    ^</span>
0172 <span class="go">SyntaxError: invalid syntax</span>
0173 </pre></div>
0174 
0175 <p>In this case, since <code>*args</code> comes after <code>**kwargs</code>, the Python interpreter throws a <code>SyntaxError</code>.</p>
0176 <h2 id="unpacking-with-the-asterisk-operators">Unpacking With the Asterisk Operators: <code>*</code> &amp; <code>**</code></h2>
0177 <p>You are now able to use <code>*args</code> and <code>**kwargs</code> to define Python functions that take a varying number of input arguments. Let&rsquo;s go a little deeper to understand something more about the <strong>unpacking operators</strong>.</p>
0178 <p>The single and double asterisk unpacking operators were introduced in Python 2. As of the 3.5 release, they have become even more powerful, thanks to <a href="https://www.python.org/dev/peps/pep-0448/">PEP 448</a>. In short, the unpacking operators are operators that unpack the values from iterable objects in Python. The single asterisk operator <code>*</code> can be used on any iterable that Python provides, while the double asterisk operator <code>**</code> can only be used on dictionaries.</p>
0179 <p>Let&rsquo;s start with an example:</p>
0180 <div class="highlight python"><pre><span></span><span class="c1"># print_list.py</span>
0181 <span class="n">my_list</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
0182 <span class="nb">print</span><span class="p">(</span><span class="n">my_list</span><span class="p">)</span>
0183 </pre></div>
0184 
0185 <p>This code defines a list and then prints it to the standard output:</p>
0186 <div class="highlight sh"><pre><span></span><span class="gp">$</span> python print_list.py
0187 <span class="go">[1, 2, 3]</span>
0188 </pre></div>
0189 
0190 <p>Note how the list is printed, along with the corresponding brackets and commas.</p>
0191 <p>Now, try to prepend the unpacking operator <code>*</code> to the name of your list:</p>
0192 <div class="highlight python"><pre><span></span><span class="c1"># print_unpacked_list.py</span>
0193 <span class="n">my_list</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
0194 <span class="nb">print</span><span class="p">(</span><span class="o">*</span><span class="n">my_list</span><span class="p">)</span>
0195 </pre></div>
0196 
0197 <p>Here, the <code>*</code> operator tells <code>print()</code> to unpack the list first.</p>
0198 <p>In this case, the output is no longer the list itself, but rather <em>the content</em> of the list:</p>
0199 <div class="highlight sh"><pre><span></span><span class="gp">$</span> python print_unpacked_list.py
0200 <span class="go">1 2 3</span>
0201 </pre></div>
0202 
0203 <p>Can you see the difference between this execution and the one from <code>print_list.py</code>? Instead of a list, <code>print()</code> has taken three separate arguments as the input.</p>
0204 <p>Another thing you&rsquo;ll notice is that in <code>print_unpacked_list.py</code>, you used the unpacking operator <code>*</code> to call a function, instead of in a function definition. In this case, <code>print()</code> takes all the items of a list as though they were single arguments.</p>
0205 <p>You can also use this method to call your own functions, but if your function requires a specific number of arguments, then the iterable you unpack must have the same number of arguments.</p>
0206 <p>To test this behavior, consider this script:</p>
0207 <div class="highlight python"><pre><span></span><span class="c1"># unpacking_call.py</span>
0208 <span class="k">def</span> <span class="nf">my_sum</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">):</span>
0209     <span class="nb">print</span><span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">+</span> <span class="n">c</span><span class="p">)</span>
0210 
0211 <span class="n">my_list</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
0212 <span class="n">my_sum</span><span class="p">(</span><span class="o">*</span><span class="n">my_list</span><span class="p">)</span>
0213 </pre></div>
0214 
0215 <p>Here, <code>my_sum()</code> explicitly states that <code>a</code>, <code>b</code>, and <code>c</code> are required arguments.</p>
0216 <p>If you run this script, you&rsquo;ll get the sum of the three numbers in <code>my_list</code>:</p>
0217 <div class="highlight sh"><pre><span></span><span class="gp">$</span> python unpacking_call.py
0218 <span class="go">6</span>
0219 </pre></div>
0220 
0221 <p>The 3 elements in <code>my_list</code> match up perfectly with the required arguments in <code>my_sum()</code>.</p>
0222 <p>Now look at the following script, where <code>my_list</code> has 4 arguments instead of 3:</p>
0223 <div class="highlight python"><pre><span></span><span class="c1"># wrong_unpacking_call.py</span>
0224 <span class="k">def</span> <span class="nf">my_sum</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">):</span>
0225     <span class="nb">print</span><span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">+</span> <span class="n">c</span><span class="p">)</span>
0226 
0227 <span class="n">my_list</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">]</span>
0228 <span class="n">my_sum</span><span class="p">(</span><span class="o">*</span><span class="n">my_list</span><span class="p">)</span>
0229 </pre></div>
0230 
0231 <p>In this example, <code>my_sum()</code> still expects just three arguments, but the <code>*</code> operator gets 4 items from the list. If you try to execute this script, you&rsquo;ll see that the Python interpreter is unable to run it:</p>
0232 <div class="highlight sh"><pre><span></span><span class="gp">$</span> python wrong_unpacking_call.py
0233 <span class="go">Traceback (most recent call last):</span>
0234 <span class="go">  File &quot;wrong_unpacking_call.py&quot;, line 6, in &lt;module&gt;</span>
0235 <span class="go">    my_sum(*my_list)</span>
0236 <span class="go">TypeError: my_sum() takes 3 positional arguments but 4 were given</span>
0237 </pre></div>
0238 
0239 <p>When you use the <code>*</code> operator to unpack a list and pass arguments to a function, it&rsquo;s exactly as though you&rsquo;re passing every single argument alone. This means that you can use multiple unpacking operators to get values from several lists and pass them all to a single function.</p>
0240 <p>To test this behavior, consider the following example:</p>
0241 <div class="highlight python"><pre><span></span><span class="c1"># sum_integers_args_3.py</span>
0242 <span class="k">def</span> <span class="nf">my_sum</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">):</span>
0243     <span class="n">result</span> <span class="o">=</span> <span class="mi">0</span>
0244     <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">args</span><span class="p">:</span>
0245         <span class="n">result</span> <span class="o">+=</span> <span class="n">x</span>
0246     <span class="k">return</span> <span class="n">result</span>
0247 
0248 <span class="n">list1</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
0249 <span class="n">list2</span> <span class="o">=</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span>
0250 <span class="n">list3</span> <span class="o">=</span> <span class="p">[</span><span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">]</span>
0251 
0252 <span class="nb">print</span><span class="p">(</span><span class="n">my_sum</span><span class="p">(</span><span class="o">*</span><span class="n">list1</span><span class="p">,</span> <span class="o">*</span><span class="n">list2</span><span class="p">,</span> <span class="o">*</span><span class="n">list3</span><span class="p">))</span>
0253 </pre></div>
0254 
0255 <p>If you run this example, all three lists are unpacked. Each individual item is passed to <code>my_sum()</code>, resulting in the following output:</p>
0256 <div class="highlight sh"><pre><span></span><span class="gp">$</span> python sum_integers_args_3.py
0257 <span class="go">45</span>
0258 </pre></div>
0259 
0260 <p>There are other convenient uses of the unpacking operator. For example, say you need to split a list into three different parts. The output should show the first value, the last value, and all the values in between. With the unpacking operator, you can do this in just one line of code:</p>
0261 <div class="highlight python"><pre><span></span><span class="c1"># extract_list_body.py</span>
0262 <span class="n">my_list</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
0263 
0264 <span class="n">a</span><span class="p">,</span> <span class="o">*</span><span class="n">b</span><span class="p">,</span> <span class="n">c</span> <span class="o">=</span> <span class="n">my_list</span>
0265 
0266 <span class="nb">print</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
0267 <span class="nb">print</span><span class="p">(</span><span class="n">b</span><span class="p">)</span>
0268 <span class="nb">print</span><span class="p">(</span><span class="n">c</span><span class="p">)</span>
0269 </pre></div>
0270 
0271 <p>In this example, <code>my_list</code> contains 6 items. The first variable is assigned to <code>a</code>, the last to <code>c</code>, and all other values are packed into a new list <code>b</code>. If you run the <a href="https://realpython.com/run-python-scripts/">script</a>, <code>print()</code> will show you that your three variables have the values you would expect:</p>
0272 <div class="highlight sh"><pre><span></span><span class="gp">$</span> python extract_list_body.py
0273 <span class="go">1</span>
0274 <span class="go">[2, 3, 4, 5]</span>
0275 <span class="go">6</span>
0276 </pre></div>
0277 
0278 <p>Another interesting thing you can do with the unpacking operator <code>*</code> is to split the items of any iterable object. This could be very useful if you need to merge two lists, for instance:</p>
0279 <div class="highlight python"><pre><span></span><span class="c1"># merging_lists.py</span>
0280 <span class="n">my_first_list</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
0281 <span class="n">my_second_list</span> <span class="o">=</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
0282 <span class="n">my_merged_list</span> <span class="o">=</span> <span class="p">[</span><span class="o">*</span><span class="n">my_first_list</span><span class="p">,</span> <span class="o">*</span><span class="n">my_second_list</span><span class="p">]</span>
0283 
0284 <span class="nb">print</span><span class="p">(</span><span class="n">my_merged_list</span><span class="p">)</span>
0285 </pre></div>
0286 
0287 <p>The unpacking operator <code>*</code> is prepended to both <code>my_first_list</code> and <code>my_second_list</code>.</p>
0288 <p>If you run this script, you&rsquo;ll see that the result is a merged list:</p>
0289 <div class="highlight sh"><pre><span></span><span class="gp">$</span> python merging_lists.py
0290 <span class="go">[1, 2, 3, 4, 5, 6]</span>
0291 </pre></div>
0292 
0293 <p>You can even merge two different dictionaries by using the unpacking operator <code>**</code>:</p>
0294 <div class="highlight python"><pre><span></span><span class="c1"># merging_dicts.py</span>
0295 <span class="n">my_first_dict</span> <span class="o">=</span> <span class="p">{</span><span class="s2">&quot;A&quot;</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s2">&quot;B&quot;</span><span class="p">:</span> <span class="mi">2</span><span class="p">}</span>
0296 <span class="n">my_second_dict</span> <span class="o">=</span> <span class="p">{</span><span class="s2">&quot;C&quot;</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span> <span class="s2">&quot;D&quot;</span><span class="p">:</span> <span class="mi">4</span><span class="p">}</span>
0297 <span class="n">my_merged_dict</span> <span class="o">=</span> <span class="p">{</span><span class="o">**</span><span class="n">my_first_dict</span><span class="p">,</span> <span class="o">**</span><span class="n">my_second_dict</span><span class="p">}</span>
0298 
0299 <span class="nb">print</span><span class="p">(</span><span class="n">my_merged_dict</span><span class="p">)</span>
0300 </pre></div>
0301 
0302 <p>Here, the iterables to merge are <code>my_first_dict</code> and <code>my_second_dict</code>.</p>
0303 <p>Executing this code outputs a merged dictionary:</p>
0304 <div class="highlight sh"><pre><span></span><span class="gp">$</span> python merging_dicts.py
0305 <span class="go">{&#39;A&#39;: 1, &#39;B&#39;: 2, &#39;C&#39;: 3, &#39;D&#39;: 4}</span>
0306 </pre></div>
0307 
0308 <p>Remember that the <code>*</code> operator works on <em>any</em> iterable object. It can also be used to unpack a <a href="https://realpython.com/python-strings/">string</a>:</p>
0309 <div class="highlight python"><pre><span></span><span class="c1"># string_to_list.py</span>
0310 <span class="n">a</span> <span class="o">=</span> <span class="p">[</span><span class="o">*</span><span class="s2">&quot;RealPython&quot;</span><span class="p">]</span>
0311 <span class="nb">print</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
0312 </pre></div>
0313 
0314 <p>In Python, strings are iterable objects, so <code>*</code> will unpack it and place all individual values in a list <code>a</code>:</p>
0315 <div class="highlight sh"><pre><span></span><span class="gp">$</span> python string_to_list.py
0316 <span class="go">[&#39;R&#39;, &#39;e&#39;, &#39;a&#39;, &#39;l&#39;, &#39;P&#39;, &#39;y&#39;, &#39;t&#39;, &#39;h&#39;, &#39;o&#39;, &#39;n&#39;]</span>
0317 </pre></div>
0318 
0319 <p>The previous example seems great, but when you work with these operators it&rsquo;s important to keep in mind the seventh rule of <a href="https://www.python.org/dev/peps/pep-0020/"><em>The Zen of Python</em></a> by Tim Peters: <em>Readability counts</em>.</p>
0320 <p>To see why, consider the following example:</p>
0321 <div class="highlight python"><pre><span></span><span class="c1"># mysterious_statement.py</span>
0322 <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="o">=</span> <span class="s2">&quot;RealPython&quot;</span>
0323 <span class="nb">print</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
0324 </pre></div>
0325 
0326 <p>There&rsquo;s the unpacking operator <code>*</code>, followed by a variable, a comma, and an assignment. That&rsquo;s a lot packed into one line! In fact, this code is no different from the previous example. It just takes the string <code>RealPython</code> and assigns all the items to the new list <code>a</code>, thanks to the unpacking operator <code>*</code>.</p>
0327 <p>The comma after the <code>a</code> does the trick. When you use the unpacking operator with variable assignment, Python requires that your resulting variable is either a list or a tuple. With the trailing comma, you have actually defined a tuple with just one named variable <code>a</code>.</p>
0328 <p>While this is a neat trick, many Pythonistas would not consider this code to be very readable. As such, it&rsquo;s best to use these kinds of constructions sparingly.</p>
0329 <h2 id="conclusion">Conclusion</h2>
0330 <p>You are now able to use <strong><code>*args</code></strong> and <strong><code>**kwargs</code></strong> to accept a changeable number of arguments in your functions. You have also learned something more about the unpacking operators. </p>
0331 <p>You&rsquo;ve learned:</p>
0332 <ul>
0333 <li>What <code>*args</code> and <code>**kwargs</code> actually mean</li>
0334 <li>How to use <code>*args</code> and <code>**kwargs</code> in function definitions</li>
0335 <li>How to use a single asterisk (<code>*</code>) to unpack iterables</li>
0336 <li>How to use two asterisks (<code>**</code>) to unpack dictionaries</li>
0337 </ul>
0338 <p>If you still have questions, don&rsquo;t hesitate to reach out in the comments section below! To learn more about the use of the asterisks in Python, have a look at <a href="https://treyhunner.com/2018/10/asterisks-in-python-what-they-are-and-how-to-use-them/">Trey Hunner&rsquo;s article on the subject</a>.</p>
0339         <hr />
0340         <p><em>[ Improve Your Python With 🐍 Python Tricks πŸ’Œ – Get a short &amp; sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&amp;utm_medium=rss&amp;utm_campaign=footer">&gt;&gt; Click here to learn more and see examples</a> ]</em></p>#
0341 datePublished: #Wed Sep 4 14:00:00 2019#
0342 dateUpdated: #Wed Sep 4 14:00:00 2019#
0343 # Person begin ####################
0344 name: #Real Python#
0345 # Person end ######################
0346 # Item end ########################
0347 # Item begin ######################
0348 id: #https://realpython.com/courses/lists-tuples-python/#
0349 title: #Lists and Tuples in Python#
0350 link: #https://realpython.com/courses/lists-tuples-python/#
0351 description: #In this course, you&apos;ll cover the important characteristics of lists and tuples in Python 3. You&apos;ll learn how to define them and how to manipulate them. When you&apos;re finished, you&apos;ll have a good feel for when and how to use these object types in a Python program.#
0352 content: #<p>In this course, you&rsquo;ll learn about working with lists and tuples. <strong>Lists</strong> and <strong>tuples</strong> are arguably Python&rsquo;s most versatile, useful <a href="https://realpython.com/python-data-types/">data types</a>. You&rsquo;ll find them in virtually every non-trivial Python program.</p>
0353 <p><strong>Here&rsquo;s what you&rsquo;ll learn in this tutorial:</strong> You&rsquo;ll cover the important characteristics of lists and tuples. You&rsquo;ll learn how to define them and how to manipulate them.  When you&rsquo;re finished, you&rsquo;ll have a good feel for when and how to use these object types in a Python program.</p>
0354 <div class="alert alert-primary" role="alert">
0355 <p><strong><i class="fa fa-graduation-cap" aria-hidden="true"></i> Take the Quiz:</strong> Test your knowledge with our interactive β€œPython Lists and Tuples” quiz. Upon completion you will receive a score so you can track your learning progress over time:</p><p class="text-center my-2"><a class="btn btn-primary" href="/quizzes/python-lists-tuples/" target="_blank">Take the Quiz Β»</a></p>
0356 </div>
0357         <hr />
0358         <p><em>[ Improve Your Python With 🐍 Python Tricks πŸ’Œ – Get a short &amp; sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&amp;utm_medium=rss&amp;utm_campaign=footer">&gt;&gt; Click here to learn more and see examples</a> ]</em></p>#
0359 datePublished: #Tue Sep 3 14:00:00 2019#
0360 dateUpdated: #Tue Sep 3 14:00:00 2019#
0361 # Person begin ####################
0362 name: #Real Python#
0363 # Person end ######################
0364 # Item end ########################
0365 # Item begin ######################
0366 id: #https://realpython.com/natural-language-processing-spacy-python/#
0367 title: #Natural Language Processing With spaCy in Python#
0368 link: #https://realpython.com/natural-language-processing-spacy-python/#
0369 description: #In this step-by-step tutorial, you&apos;ll learn how to use spaCy. This free and open-source library for Natural Language Processing (NLP) in Python has a lot of built-in capabilities and is becoming increasingly popular for processing and analyzing data in NLP.#
0370 content: #<p><strong>spaCy</strong> is a free and open-source library for <strong>Natural Language Processing</strong> (NLP) in Python with a lot of in-built capabilities. It&rsquo;s becoming increasingly popular for processing and analyzing data in NLP. Unstructured textual data is produced at a large scale, and it&rsquo;s important to process and derive insights from unstructured data. To do that, you need to represent the data in a format that can be understood by computers. NLP can help you do that.</p>
0371 <p><strong>In this tutorial, you&rsquo;ll learn:</strong></p>
0372 <ul>
0373 <li>What the foundational terms and concepts in NLP are</li>
0374 <li>How to implement those concepts in spaCy</li>
0375 <li>How to customize and extend built-in functionalities in spaCy</li>
0376 <li>How to perform basic statistical analysis on a text</li>
0377 <li>How to create a pipeline to process unstructured text</li>
0378 <li>How to parse a sentence and extract meaningful insights from it</li>
0379 </ul>
0380 <div class="alert alert-warning" role="alert"><p><strong>Free Bonus:</strong> <a href="" class="alert-link" data-toggle="modal" data-target="#modal-python-tricks-sample" data-focus="false">Click here to get access to a chapter from Python Tricks: The Book</a> that shows you Python's best practices with simple examples you can apply instantly to write more beautiful + Pythonic code.</p></div>
0381 
0382 <h2 id="what-are-nlp-and-spacy">What Are NLP and spaCy?</h2>
0383 <p><strong>NLP</strong> is a subfield of <strong>Artificial Intelligence</strong> and is concerned with interactions between computers and human languages. NLP is the process of analyzing, understanding, and deriving meaning from human languages for computers.</p>
0384 <p>NLP helps you extract insights from unstructured text and has several use cases, such as:</p>
0385 <ul>
0386 <li><a href="https://en.wikipedia.org/wiki/Automatic_summarization">Automatic summarization</a></li>
0387 <li><a href="https://en.wikipedia.org/wiki/Named-entity_recognition">Named entity recognition</a></li>
0388 <li><a href="https://en.wikipedia.org/wiki/Question_answering">Question answering systems</a></li>
0389 <li><a href="https://en.wikipedia.org/wiki/Sentiment_analysis">Sentiment analysis</a></li>
0390 </ul>
0391 <p>spaCy is a free, open-source library for NLP in Python. It&rsquo;s written in <a href="https://cython.org/">Cython</a> and is designed to build information extraction or natural language understanding systems. It&rsquo;s built for production use and provides a concise and user-friendly API.</p>
0392 <h2 id="installation">Installation</h2>
0393 <p>In this section, you&rsquo;ll install spaCy and then download data and models for the English language.</p>
0394 <h3 id="how-to-install-spacy">How to Install spaCy</h3>
0395 <p>spaCy can be installed using <strong><code>pip</code></strong>, a Python package manager. You can use a <strong>virtual environment</strong> to avoid depending on system-wide packages. To learn more about virtual environments and <code>pip</code>, check out <a href="https://realpython.com/what-is-pip/">What Is Pip? A Guide for New Pythonistas</a> and <a href="https://realpython.com/python-virtual-environments-a-primer/">Python Virtual Environments: A Primer</a>.</p>
0396 <p>Create a new virtual environment:</p>
0397 <div class="highlight sh"><pre><span></span><span class="gp">$</span> python3 -m venv env
0398 </pre></div>
0399 
0400 <p>Activate this virtual environment and install spaCy:</p>
0401 <div class="highlight sh"><pre><span></span><span class="gp">$</span> <span class="nb">source</span> ./env/bin/activate
0402 <span class="gp">$</span> pip install spacy
0403 </pre></div>
0404 
0405 <h3 id="how-to-download-models-and-data">How to Download Models and Data</h3>
0406 <p>spaCy has <a href="https://spaCy.io/models">different types</a> of models. The default model for the English language is <code>en_core_web_sm</code>.</p>
0407 <p>Activate the virtual environment created in the previous step and download models and data for the English language:</p>
0408 <div class="highlight sh"><pre><span></span><span class="gp">$</span> python -m spacy download en_core_web_sm
0409 </pre></div>
0410 
0411 <p>Verify if the download was successful or not by loading it:</p>
0412 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">spacy</span>
0413 <span class="gp">&gt;&gt;&gt; </span><span class="n">nlp</span> <span class="o">=</span> <span class="n">spacy</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">&#39;en_core_web_sm&#39;</span><span class="p">)</span>
0414 </pre></div>
0415 
0416 <p>If the <code>nlp</code> object is created, then it means that spaCy was installed and that models and data were successfully downloaded.</p>
0417 <h2 id="using-spacy">Using spaCy</h2>
0418 <p>In this section, you&rsquo;ll use spaCy for a given input string and a text file. Load the language model instance in spaCy:</p>
0419 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">spacy</span>
0420 <span class="gp">&gt;&gt;&gt; </span><span class="n">nlp</span> <span class="o">=</span> <span class="n">spacy</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">&#39;en_core_web_sm&#39;</span><span class="p">)</span>
0421 </pre></div>
0422 
0423 <p>Here, the <code>nlp</code> object is a language model instance. You can assume that, throughout this tutorial, <code>nlp</code> refers to the language model loaded by <code>en_core_web_sm</code>. Now you can use spaCy to read a string or a text file.</p>
0424 <h3 id="how-to-read-a-string">How to Read a String</h3>
0425 <p>You can use spaCy to create a processed <a href="https://spaCy.io/api/doc">Doc</a> object, which is a container for accessing linguistic annotations, for a given input string:</p>
0426 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">introduction_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">&#39;This tutorial is about Natural&#39;</span>
0427 <span class="gp">... </span>    <span class="s1">&#39; Language Processing in Spacy.&#39;</span><span class="p">)</span>
0428 <span class="gp">&gt;&gt;&gt; </span><span class="n">introduction_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">introduction_text</span><span class="p">)</span>
0429 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Extract tokens for the given doc</span>
0430 <span class="gp">&gt;&gt;&gt; </span><span class="nb">print</span> <span class="p">([</span><span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">introduction_doc</span><span class="p">])</span>
0431 <span class="go">[&#39;This&#39;, &#39;tutorial&#39;, &#39;is&#39;, &#39;about&#39;, &#39;Natural&#39;, &#39;Language&#39;,</span>
0432 <span class="go">&#39;Processing&#39;, &#39;in&#39;, &#39;Spacy&#39;, &#39;.&#39;]</span>
0433 </pre></div>
0434 
0435 <p>In the above example, notice how the text is converted to an object that is understood by spaCy. You can use this method to convert any text into a processed <code>Doc</code> object and deduce attributes, which will be covered in the coming sections.</p>
0436 <h3 id="how-to-read-a-text-file">How to Read a Text File</h3>
0437 <p>In this section, you&rsquo;ll create a processed <a href="https://spaCy.io/api/doc">Doc</a> object for a text file:</p>
0438 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">file_name</span> <span class="o">=</span> <span class="s1">&#39;introduction.txt&#39;</span>
0439 <span class="gp">&gt;&gt;&gt; </span><span class="n">introduction_file_text</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">file_name</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
0440 <span class="gp">&gt;&gt;&gt; </span><span class="n">introduction_file_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">introduction_file_text</span><span class="p">)</span>
0441 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Extract tokens for the given doc</span>
0442 <span class="gp">&gt;&gt;&gt; </span><span class="nb">print</span> <span class="p">([</span><span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">introduction_file_doc</span><span class="p">])</span>
0443 <span class="go">[&#39;This&#39;, &#39;tutorial&#39;, &#39;is&#39;, &#39;about&#39;, &#39;Natural&#39;, &#39;Language&#39;,</span>
0444 <span class="go">&#39;Processing&#39;, &#39;in&#39;, &#39;Spacy&#39;, &#39;.&#39;, &#39;\n&#39;]</span>
0445 </pre></div>
0446 
0447 <p>This is how you can convert a text file into a processed <code>Doc</code> object.</p>
0448 <div class="alert alert-primary" role="alert">
0449 <p><strong>Note:</strong> </p>
0450 <p>You can assume that:</p>
0451 <ul>
0452 <li>Variable names ending with the suffix <strong><code>_text</code></strong> are <strong><a href="https://realpython.com/python-encodings-guide/">Unicode</a> string objects</strong>.</li>
0453 <li>Variable name ending with the suffix <strong><code>_doc</code></strong> are <strong>spaCy&rsquo;s language model objects</strong>.</li>
0454 </ul>
0455 </div>
0456 <h2 id="sentence-detection">Sentence Detection</h2>
0457 <p><strong>Sentence Detection</strong> is the process of locating the start and end of sentences in a given text. This allows you to you divide a text into linguistically meaningful units. You&rsquo;ll use these units when you&rsquo;re processing your text to perform tasks such as <strong>part of speech tagging</strong> and <strong>entity extraction</strong>.</p>
0458 <p>In spaCy, the <code>sents</code> property is used to extract sentences. Here&rsquo;s how you would extract the total number of sentences and the sentences for a given input text:</p>
0459 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">about_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">&#39;Gus Proto is a Python developer currently&#39;</span>
0460 <span class="gp">... </span>              <span class="s1">&#39; working for a London-based Fintech&#39;</span>
0461 <span class="gp">... </span>              <span class="s1">&#39; company. He is interested in learning&#39;</span>
0462 <span class="gp">... </span>              <span class="s1">&#39; Natural Language Processing.&#39;</span><span class="p">)</span>
0463 <span class="gp">&gt;&gt;&gt; </span><span class="n">about_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">about_text</span><span class="p">)</span>
0464 <span class="gp">&gt;&gt;&gt; </span><span class="n">sentences</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">about_doc</span><span class="o">.</span><span class="n">sents</span><span class="p">)</span>
0465 <span class="gp">&gt;&gt;&gt; </span><span class="nb">len</span><span class="p">(</span><span class="n">sentences</span><span class="p">)</span>
0466 <span class="go">2</span>
0467 <span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">sentence</span> <span class="ow">in</span> <span class="n">sentences</span><span class="p">:</span>
0468 <span class="gp">... </span>    <span class="nb">print</span> <span class="p">(</span><span class="n">sentence</span><span class="p">)</span>
0469 <span class="gp">...</span>
0470 <span class="go">&#39;Gus Proto is a Python developer currently working for a</span>
0471 <span class="go">London-based Fintech company.&#39;</span>
0472 <span class="go">&#39;He is interested in learning Natural Language Processing.&#39;</span>
0473 </pre></div>
0474 
0475 <p>In the above example, spaCy is correctly able to identify sentences in the English language, using a full stop(<code>.</code>) as the sentence delimiter. You can also customize the sentence detection to detect sentences on custom delimiters.</p>
0476 <p>Here&rsquo;s an example, where an ellipsis(<code>...</code>) is used as the delimiter:</p>
0477 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">def</span> <span class="nf">set_custom_boundaries</span><span class="p">(</span><span class="n">doc</span><span class="p">):</span>
0478 <span class="gp">... </span>    <span class="c1"># Adds support to use `...` as the delimiter for sentence detection</span>
0479 <span class="gp">... </span>    <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">doc</span><span class="p">[:</span><span class="o">-</span><span class="mi">1</span><span class="p">]:</span>
0480 <span class="gp">... </span>        <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="o">==</span> <span class="s1">&#39;...&#39;</span><span class="p">:</span>
0481 <span class="gp">... </span>            <span class="n">doc</span><span class="p">[</span><span class="n">token</span><span class="o">.</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">is_sent_start</span> <span class="o">=</span> <span class="kc">True</span>
0482 <span class="gp">... </span>    <span class="k">return</span> <span class="n">doc</span>
0483 <span class="gp">...</span>
0484 <span class="gp">&gt;&gt;&gt; </span><span class="n">ellipsis_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">&#39;Gus, can you, ... never mind, I forgot&#39;</span>
0485 <span class="gp">... </span>                 <span class="s1">&#39; what I was saying. So, do you think&#39;</span>
0486 <span class="gp">... </span>                 <span class="s1">&#39; we should ...&#39;</span><span class="p">)</span>
0487 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Load a new model instance</span>
0488 <span class="gp">&gt;&gt;&gt; </span><span class="n">custom_nlp</span> <span class="o">=</span> <span class="n">spacy</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">&#39;en_core_web_sm&#39;</span><span class="p">)</span>
0489 <span class="gp">&gt;&gt;&gt; </span><span class="n">custom_nlp</span><span class="o">.</span><span class="n">add_pipe</span><span class="p">(</span><span class="n">set_custom_boundaries</span><span class="p">,</span> <span class="n">before</span><span class="o">=</span><span class="s1">&#39;parser&#39;</span><span class="p">)</span>
0490 <span class="gp">&gt;&gt;&gt; </span><span class="n">custom_ellipsis_doc</span> <span class="o">=</span> <span class="n">custom_nlp</span><span class="p">(</span><span class="n">ellipsis_text</span><span class="p">)</span>
0491 <span class="gp">&gt;&gt;&gt; </span><span class="n">custom_ellipsis_sentences</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">custom_ellipsis_doc</span><span class="o">.</span><span class="n">sents</span><span class="p">)</span>
0492 <span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">sentence</span> <span class="ow">in</span> <span class="n">custom_ellipsis_sentences</span><span class="p">:</span>
0493 <span class="gp">... </span>    <span class="nb">print</span><span class="p">(</span><span class="n">sentence</span><span class="p">)</span>
0494 <span class="gp">...</span>
0495 <span class="go">Gus, can you, ...</span>
0496 <span class="go">never mind, I forgot what I was saying.</span>
0497 <span class="go">So, do you think we should ...</span>
0498 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Sentence Detection with no customization</span>
0499 <span class="gp">&gt;&gt;&gt; </span><span class="n">ellipsis_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">ellipsis_text</span><span class="p">)</span>
0500 <span class="gp">&gt;&gt;&gt; </span><span class="n">ellipsis_sentences</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">ellipsis_doc</span><span class="o">.</span><span class="n">sents</span><span class="p">)</span>
0501 <span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">sentence</span> <span class="ow">in</span> <span class="n">ellipsis_sentences</span><span class="p">:</span>
0502 <span class="gp">... </span>    <span class="nb">print</span><span class="p">(</span><span class="n">sentence</span><span class="p">)</span>
0503 <span class="gp">...</span>
0504 <span class="go">Gus, can you, ... never mind, I forgot what I was saying.</span>
0505 <span class="go">So, do you think we should ...</span>
0506 </pre></div>
0507 
0508 <p>Note that <code>custom_ellipsis_sentences</code> contain three sentences, whereas <code>ellipsis_sentences</code> contains two sentences. These sentences are still obtained via the <code>sents</code> attribute, as you saw before.</p>
0509 <h2 id="tokenization-in-spacy">Tokenization in spaCy</h2>
0510 <p><strong>Tokenization</strong> is the next step after sentence detection. It allows you to identify the basic units in your text. These basic units are called <strong>tokens</strong>. Tokenization is useful because it breaks a text into meaningful units. These units are used for further analysis, like part of speech tagging.</p>
0511 <p>In spaCy, you can print tokens by iterating on the <code>Doc</code> object:</p>
0512 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">about_doc</span><span class="p">:</span>
0513 <span class="gp">... </span>    <span class="nb">print</span> <span class="p">(</span><span class="n">token</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">idx</span><span class="p">)</span>
0514 <span class="gp">...</span>
0515 <span class="go">Gus 0</span>
0516 <span class="go">Proto 4</span>
0517 <span class="go">is 10</span>
0518 <span class="go">a 13</span>
0519 <span class="go">Python 15</span>
0520 <span class="go">developer 22</span>
0521 <span class="go">currently 32</span>
0522 <span class="go">working 42</span>
0523 <span class="go">for 50</span>
0524 <span class="go">a 54</span>
0525 <span class="go">London 56</span>
0526 <span class="go">- 62</span>
0527 <span class="go">based 63</span>
0528 <span class="go">Fintech 69</span>
0529 <span class="go">company 77</span>
0530 <span class="go">. 84</span>
0531 <span class="go">He 86</span>
0532 <span class="go">is 89</span>
0533 <span class="go">interested 92</span>
0534 <span class="go">in 103</span>
0535 <span class="go">learning 106</span>
0536 <span class="go">Natural 115</span>
0537 <span class="go">Language 123</span>
0538 <span class="go">Processing 132</span>
0539 <span class="go">. 142</span>
0540 </pre></div>
0541 
0542 <p>Note how spaCy preserves the <strong>starting index</strong> of the tokens. It&rsquo;s useful for in-place word replacement. spaCy provides <a href="https://spacy.io/api/token#attributes">various attributes</a> for the <code>Token</code> class:</p>
0543 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">about_doc</span><span class="p">:</span>
0544 <span class="gp">... </span>    <span class="nb">print</span> <span class="p">(</span><span class="n">token</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">idx</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">text_with_ws</span><span class="p">,</span>
0545 <span class="gp">... </span>           <span class="n">token</span><span class="o">.</span><span class="n">is_alpha</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">is_punct</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">is_space</span><span class="p">,</span>
0546 <span class="gp">... </span>           <span class="n">token</span><span class="o">.</span><span class="n">shape_</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">is_stop</span><span class="p">)</span>
0547 <span class="gp">...</span>
0548 <span class="go">Gus 0 Gus  True False False Xxx False</span>
0549 <span class="go">Proto 4 Proto  True False False Xxxxx False</span>
0550 <span class="go">is 10 is  True False False xx True</span>
0551 <span class="go">a 13 a  True False False x True</span>
0552 <span class="go">Python 15 Python  True False False Xxxxx False</span>
0553 <span class="go">developer 22 developer  True False False xxxx False</span>
0554 <span class="go">currently 32 currently  True False False xxxx False</span>
0555 <span class="go">working 42 working  True False False xxxx False</span>
0556 <span class="go">for 50 for  True False False xxx True</span>
0557 <span class="go">a 54 a  True False False x True</span>
0558 <span class="go">London 56 London True False False Xxxxx False</span>
0559 <span class="go">- 62 - False True False - False</span>
0560 <span class="go">based 63 based  True False False xxxx False</span>
0561 <span class="go">Fintech 69 Fintech  True False False Xxxxx False</span>
0562 <span class="go">company 77 company True False False xxxx False</span>
0563 <span class="go">. 84 .  False True False . False</span>
0564 <span class="go">He 86 He  True False False Xx True</span>
0565 <span class="go">is 89 is  True False False xx True</span>
0566 <span class="go">interested 92 interested  True False False xxxx False</span>
0567 <span class="go">in 103 in  True False False xx True</span>
0568 <span class="go">learning 106 learning  True False False xxxx False</span>
0569 <span class="go">Natural 115 Natural  True False False Xxxxx False</span>
0570 <span class="go">Language 123 Language  True False False Xxxxx False</span>
0571 <span class="go">Processing 132 Processing True False False Xxxxx False</span>
0572 <span class="go">. 142 . False True False . False</span>
0573 </pre></div>
0574 
0575 <p>In this example, some of the commonly required attributes are accessed:</p>
0576 <ul>
0577 <li><strong><code>text_with_ws</code></strong> prints token text with trailing space (if present).</li>
0578 <li><strong><code>is_alpha</code></strong> detects if the token consists of alphabetic characters or not.</li>
0579 <li><strong><code>is_punct</code></strong> detects if the token is a punctuation symbol or not.</li>
0580 <li><strong><code>is_space</code></strong> detects if the token is a space or not.</li>
0581 <li><strong><code>shape_</code></strong> prints out the shape of the word.</li>
0582 <li><strong><code>is_stop</code></strong> detects if the token is a stop word or not.</li>
0583 </ul>
0584 <div class="alert alert-primary" role="alert">
0585 <p><strong>Note:</strong> You&rsquo;ll learn more about <strong>stop words</strong> in the next section.</p>
0586 </div>
0587 <p>You can also customize the tokenization process to detect tokens on custom characters. This is often used for hyphenated words, which are words joined with hyphen. For example, &ldquo;London-based&rdquo; is a hyphenated word.</p>
0588 <p>spaCy allows you to customize tokenization by updating the <code>tokenizer</code> property on the <code>nlp</code> object:</p>
0589 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">re</span>
0590 <span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">spacy</span>
0591 <span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">spacy.tokenizer</span> <span class="k">import</span> <span class="n">Tokenizer</span>
0592 <span class="gp">&gt;&gt;&gt; </span><span class="n">custom_nlp</span> <span class="o">=</span> <span class="n">spacy</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">&#39;en_core_web_sm&#39;</span><span class="p">)</span>
0593 <span class="gp">&gt;&gt;&gt; </span><span class="n">prefix_re</span> <span class="o">=</span> <span class="n">spacy</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">compile_prefix_regex</span><span class="p">(</span><span class="n">custom_nlp</span><span class="o">.</span><span class="n">Defaults</span><span class="o">.</span><span class="n">prefixes</span><span class="p">)</span>
0594 <span class="gp">&gt;&gt;&gt; </span><span class="n">suffix_re</span> <span class="o">=</span> <span class="n">spacy</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">compile_suffix_regex</span><span class="p">(</span><span class="n">custom_nlp</span><span class="o">.</span><span class="n">Defaults</span><span class="o">.</span><span class="n">suffixes</span><span class="p">)</span>
0595 <span class="gp">&gt;&gt;&gt; </span><span class="n">infix_re</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s1">&#39;&#39;&#39;[-~]&#39;&#39;&#39;</span><span class="p">)</span>
0596 <span class="gp">&gt;&gt;&gt; </span><span class="k">def</span> <span class="nf">customize_tokenizer</span><span class="p">(</span><span class="n">nlp</span><span class="p">):</span>
0597 <span class="gp">... </span>    <span class="c1"># Adds support to use `-` as the delimiter for tokenization</span>
0598 <span class="gp">... </span>    <span class="k">return</span> <span class="n">Tokenizer</span><span class="p">(</span><span class="n">nlp</span><span class="o">.</span><span class="n">vocab</span><span class="p">,</span> <span class="n">prefix_search</span><span class="o">=</span><span class="n">prefix_re</span><span class="o">.</span><span class="n">search</span><span class="p">,</span>
0599 <span class="gp">... </span>                     <span class="n">suffix_search</span><span class="o">=</span><span class="n">suffix_re</span><span class="o">.</span><span class="n">search</span><span class="p">,</span>
0600 <span class="gp">... </span>                     <span class="n">infix_finditer</span><span class="o">=</span><span class="n">infix_re</span><span class="o">.</span><span class="n">finditer</span><span class="p">,</span>
0601 <span class="gp">... </span>                     <span class="n">token_match</span><span class="o">=</span><span class="kc">None</span>
0602 <span class="gp">... </span>                     <span class="p">)</span>
0603 <span class="gp">...</span>
0604 
0605 <span class="gp">&gt;&gt;&gt; </span><span class="n">custom_nlp</span><span class="o">.</span><span class="n">tokenizer</span> <span class="o">=</span> <span class="n">customize_tokenizer</span><span class="p">(</span><span class="n">custom_nlp</span><span class="p">)</span>
0606 <span class="gp">&gt;&gt;&gt; </span><span class="n">custom_tokenizer_about_doc</span> <span class="o">=</span> <span class="n">custom_nlp</span><span class="p">(</span><span class="n">about_text</span><span class="p">)</span>
0607 <span class="gp">&gt;&gt;&gt; </span><span class="nb">print</span><span class="p">([</span><span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">custom_tokenizer_about_doc</span><span class="p">])</span>
0608 <span class="go">[&#39;Gus&#39;, &#39;Proto&#39;, &#39;is&#39;, &#39;a&#39;, &#39;Python&#39;, &#39;developer&#39;, &#39;currently&#39;,</span>
0609 <span class="go">&#39;working&#39;, &#39;for&#39;, &#39;a&#39;, &#39;London&#39;, &#39;-&#39;, &#39;based&#39;, &#39;Fintech&#39;,</span>
0610 <span class="go">&#39;company&#39;, &#39;.&#39;, &#39;He&#39;, &#39;is&#39;, &#39;interested&#39;, &#39;in&#39;, &#39;learning&#39;,</span>
0611 <span class="go">&#39;Natural&#39;, &#39;Language&#39;, &#39;Processing&#39;, &#39;.&#39;]</span>
0612 </pre></div>
0613 
0614 <p>In order for you to customize, you can pass various parameters to the <code>Tokenizer</code> class:</p>
0615 <ul>
0616 <li><strong><code>nlp.vocab</code></strong> is a storage container for special cases and is used to handle cases like contractions and emoticons.</li>
0617 <li><strong><code>prefix_search</code></strong> is the function that is used to handle preceding punctuation, such as opening parentheses.</li>
0618 <li><strong><code>infix_finditer</code></strong> is the function that is used to handle non-whitespace separators, such as hyphens.</li>
0619 <li><strong><code>suffix_search</code></strong> is the function that is used to handle succeeding punctuation, such as closing parentheses.</li>
0620 <li><strong><code>token_match</code></strong> is an optional boolean function that is used to match strings that should never be split. It overrides the previous rules and is useful for entities like URLs or numbers.</li>
0621 </ul>
0622 <div class="alert alert-primary" role="alert">
0623 <p><strong>Note:</strong> spaCy already detects hyphenated words as individual tokens. The above code is just an example to show how tokenization can be customized. It can be used for any other character.</p>
0624 </div>
0625 <h2 id="stop-words">Stop Words</h2>
0626 <p><strong>Stop words</strong> are the most common words in a language. In the English language, some examples of stop words are <code>the</code>, <code>are</code>, <code>but</code>, and <code>they</code>. Most sentences need to contain stop words in order to be full sentences that make sense.</p>
0627 <p>Generally, stop words are removed because they aren&rsquo;t significant and distort the word frequency analysis. spaCy has a list of stop words for the English language:</p>
0628 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">spacy</span>
0629 <span class="gp">&gt;&gt;&gt; </span><span class="n">spacy_stopwords</span> <span class="o">=</span> <span class="n">spacy</span><span class="o">.</span><span class="n">lang</span><span class="o">.</span><span class="n">en</span><span class="o">.</span><span class="n">stop_words</span><span class="o">.</span><span class="n">STOP_WORDS</span>
0630 <span class="gp">&gt;&gt;&gt; </span><span class="nb">len</span><span class="p">(</span><span class="n">spacy_stopwords</span><span class="p">)</span>
0631 <span class="go">326</span>
0632 <span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">stop_word</span> <span class="ow">in</span> <span class="nb">list</span><span class="p">(</span><span class="n">spacy_stopwords</span><span class="p">)[:</span><span class="mi">10</span><span class="p">]:</span>
0633 <span class="gp">... </span>    <span class="nb">print</span><span class="p">(</span><span class="n">stop_word</span><span class="p">)</span>
0634 <span class="gp">...</span>
0635 <span class="go">using</span>
0636 <span class="go">becomes</span>
0637 <span class="go">had</span>
0638 <span class="go">itself</span>
0639 <span class="go">once</span>
0640 <span class="go">often</span>
0641 <span class="go">is</span>
0642 <span class="go">herein</span>
0643 <span class="go">who</span>
0644 <span class="go">too</span>
0645 </pre></div>
0646 
0647 <p>You can remove stop words from the input text:</p>
0648 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">about_doc</span><span class="p">:</span>
0649 <span class="gp">... </span>    <span class="k">if</span> <span class="ow">not</span> <span class="n">token</span><span class="o">.</span><span class="n">is_stop</span><span class="p">:</span>
0650 <span class="gp">... </span>        <span class="nb">print</span> <span class="p">(</span><span class="n">token</span><span class="p">)</span>
0651 <span class="gp">...</span>
0652 <span class="go">Gus</span>
0653 <span class="go">Proto</span>
0654 <span class="go">Python</span>
0655 <span class="go">developer</span>
0656 <span class="go">currently</span>
0657 <span class="go">working</span>
0658 <span class="go">London</span>
0659 <span class="go">-</span>
0660 <span class="go">based</span>
0661 <span class="go">Fintech</span>
0662 <span class="go">company</span>
0663 <span class="go">.</span>
0664 <span class="go">interested</span>
0665 <span class="go">learning</span>
0666 <span class="go">Natural</span>
0667 <span class="go">Language</span>
0668 <span class="go">Processing</span>
0669 <span class="go">.</span>
0670 </pre></div>
0671 
0672 <p>Stop words like <code>is</code>, <code>a</code>, <code>for</code>, <code>the</code>, and <code>in</code> are not printed in the output above. You can also create a list of tokens not containing stop words:</p>
0673 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">about_no_stopword_doc</span> <span class="o">=</span> <span class="p">[</span><span class="n">token</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">about_doc</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">token</span><span class="o">.</span><span class="n">is_stop</span><span class="p">]</span>
0674 <span class="gp">&gt;&gt;&gt; </span><span class="nb">print</span> <span class="p">(</span><span class="n">about_no_stopword_doc</span><span class="p">)</span>
0675 <span class="go">[Gus, Proto, Python, developer, currently, working, London,</span>
0676 <span class="go">-, based, Fintech, company, ., interested, learning, Natural,</span>
0677 <span class="go">Language, Processing, .]</span>
0678 </pre></div>
0679 
0680 <p><code>about_no_stopword_doc</code> can be joined with spaces to form a sentence with no stop words.</p>
0681 <h2 id="lemmatization">Lemmatization</h2>
0682 <p><strong>Lemmatization</strong> is the process of reducing inflected forms of a word while still ensuring that the reduced form belongs to the language. This reduced form or root word is called a <strong>lemma</strong>.</p>
0683 <p>For example, <em>organizes</em>, <em>organized</em> and <em>organizing</em> are all forms of <em>organize</em>. Here, <em>organize</em> is the lemma. The inflection of a word allows you to express different grammatical categories like tense (<em>organized</em> vs <em>organize</em>), number (<em>trains</em> vs <em>train</em>), and so on. Lemmatization is necessary because it helps you reduce the inflected forms of a word so that they can be analyzed as a single item. It can also help you <strong>normalize</strong> the text.</p>
0684 <p>spaCy has the attribute <code>lemma_</code> on the <code>Token</code> class. This attribute has the lemmatized form of a token:</p>
0685 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">conference_help_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">&#39;Gus is helping organize a developer&#39;</span>
0686 <span class="gp">... </span>    <span class="s1">&#39;conference on Applications of Natural Language&#39;</span>
0687 <span class="gp">... </span>    <span class="s1">&#39; Processing. He keeps organizing local Python meetups&#39;</span>
0688 <span class="gp">... </span>    <span class="s1">&#39; and several internal talks at his workplace.&#39;</span><span class="p">)</span>
0689 <span class="gp">&gt;&gt;&gt; </span><span class="n">conference_help_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">conference_help_text</span><span class="p">)</span>
0690 <span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">conference_help_doc</span><span class="p">:</span>
0691 <span class="gp">... </span>    <span class="nb">print</span> <span class="p">(</span><span class="n">token</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">lemma_</span><span class="p">)</span>
0692 <span class="gp">...</span>
0693 <span class="go">Gus Gus</span>
0694 <span class="go">is be</span>
0695 <span class="go">helping help</span>
0696 <span class="go">organize organize</span>
0697 <span class="go">a a</span>
0698 <span class="go">developer developer</span>
0699 <span class="go">conference conference</span>
0700 <span class="go">on on</span>
0701 <span class="go">Applications Applications</span>
0702 <span class="go">of of</span>
0703 <span class="go">Natural Natural</span>
0704 <span class="go">Language Language</span>
0705 <span class="go">Processing Processing</span>
0706 <span class="go">. .</span>
0707 <span class="go">He -PRON-</span>
0708 <span class="go">keeps keep</span>
0709 <span class="go">organizing organize</span>
0710 <span class="go">local local</span>
0711 <span class="go">Python Python</span>
0712 <span class="go">meetups meetup</span>
0713 <span class="go">and and</span>
0714 <span class="go">several several</span>
0715 <span class="go">internal internal</span>
0716 <span class="go">talks talk</span>
0717 <span class="go">at at</span>
0718 <span class="go">his -PRON-</span>
0719 <span class="go">workplace workplace</span>
0720 <span class="go">. .</span>
0721 </pre></div>
0722 
0723 <p>In this example, <code>organizing</code> reduces to its lemma form <code>organize</code>. If you do not lemmatize the text, then <code>organize</code> and <code>organizing</code> will be counted as different tokens, even though they both have a similar meaning. Lemmatization helps you avoid duplicate words that have similar meanings.</p>
0724 <h2 id="word-frequency">Word Frequency</h2>
0725 <p>You can now convert a given text into tokens and perform statistical analysis over it. This analysis can give you various insights about word patterns, such as common words or unique words in the text:</p>
0726 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">collections</span> <span class="k">import</span> <span class="n">Counter</span>
0727 <span class="gp">&gt;&gt;&gt; </span><span class="n">complete_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">&#39;Gus Proto is a Python developer currently&#39;</span>
0728 <span class="gp">... </span>    <span class="s1">&#39;working for a London-based Fintech company. He is&#39;</span>
0729 <span class="gp">... </span>    <span class="s1">&#39; interested in learning Natural Language Processing.&#39;</span>
0730 <span class="gp">... </span>    <span class="s1">&#39; There is a developer conference happening on 21 July&#39;</span>
0731 <span class="gp">... </span>    <span class="s1">&#39; 2019 in London. It is titled &quot;Applications of Natural&#39;</span>
0732 <span class="gp">... </span>    <span class="s1">&#39; Language Processing&quot;. There is a helpline number &#39;</span>
0733 <span class="gp">... </span>    <span class="s1">&#39; available at +1-1234567891. Gus is helping organize it.&#39;</span>
0734 <span class="gp">... </span>    <span class="s1">&#39; He keeps organizing local Python meetups and several&#39;</span>
0735 <span class="gp">... </span>    <span class="s1">&#39; internal talks at his workplace. Gus is also presenting&#39;</span>
0736 <span class="gp">... </span>    <span class="s1">&#39; a talk. The talk will introduce the reader about &quot;Use&#39;</span>
0737 <span class="gp">... </span>    <span class="s1">&#39; cases of Natural Language Processing in Fintech&quot;.&#39;</span>
0738 <span class="gp">... </span>    <span class="s1">&#39; Apart from his work, he is very passionate about music.&#39;</span>
0739 <span class="gp">... </span>    <span class="s1">&#39; Gus is learning to play the Piano. He has enrolled &#39;</span>
0740 <span class="gp">... </span>    <span class="s1">&#39; himself in the weekend batch of Great Piano Academy.&#39;</span>
0741 <span class="gp">... </span>    <span class="s1">&#39; Great Piano Academy is situated in Mayfair or the City&#39;</span>
0742 <span class="gp">... </span>    <span class="s1">&#39; of London and has world-class piano instructors.&#39;</span><span class="p">)</span>
0743 <span class="gp">...</span>
0744 <span class="gp">&gt;&gt;&gt; </span><span class="n">complete_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">complete_text</span><span class="p">)</span>
0745 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Remove stop words and punctuation symbols</span>
0746 <span class="gp">&gt;&gt;&gt; </span><span class="n">words</span> <span class="o">=</span> <span class="p">[</span><span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">complete_doc</span>
0747 <span class="gp">... </span>         <span class="k">if</span> <span class="ow">not</span> <span class="n">token</span><span class="o">.</span><span class="n">is_stop</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">token</span><span class="o">.</span><span class="n">is_punct</span><span class="p">]</span>
0748 <span class="gp">&gt;&gt;&gt; </span><span class="n">word_freq</span> <span class="o">=</span> <span class="n">Counter</span><span class="p">(</span><span class="n">words</span><span class="p">)</span>
0749 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># 5 commonly occurring words with their frequencies</span>
0750 <span class="gp">&gt;&gt;&gt; </span><span class="n">common_words</span> <span class="o">=</span> <span class="n">word_freq</span><span class="o">.</span><span class="n">most_common</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
0751 <span class="gp">&gt;&gt;&gt; </span><span class="nb">print</span> <span class="p">(</span><span class="n">common_words</span><span class="p">)</span>
0752 <span class="go">[(&#39;Gus&#39;, 4), (&#39;London&#39;, 3), (&#39;Natural&#39;, 3), (&#39;Language&#39;, 3), (&#39;Processing&#39;, 3)]</span>
0753 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Unique words</span>
0754 <span class="gp">&gt;&gt;&gt; </span><span class="n">unique_words</span> <span class="o">=</span> <span class="p">[</span><span class="n">word</span> <span class="k">for</span> <span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="n">freq</span><span class="p">)</span> <span class="ow">in</span> <span class="n">word_freq</span><span class="o">.</span><span class="n">items</span><span class="p">()</span> <span class="k">if</span> <span class="n">freq</span> <span class="o">==</span> <span class="mi">1</span><span class="p">]</span>
0755 <span class="gp">&gt;&gt;&gt; </span><span class="nb">print</span> <span class="p">(</span><span class="n">unique_words</span><span class="p">)</span>
0756 <span class="go">[&#39;Proto&#39;, &#39;currently&#39;, &#39;working&#39;, &#39;based&#39;, &#39;company&#39;,</span>
0757 <span class="go">&#39;interested&#39;, &#39;conference&#39;, &#39;happening&#39;, &#39;21&#39;, &#39;July&#39;,</span>
0758 <span class="go">&#39;2019&#39;, &#39;titled&#39;, &#39;Applications&#39;, &#39;helpline&#39;, &#39;number&#39;,</span>
0759 <span class="go">&#39;available&#39;, &#39;+1&#39;, &#39;1234567891&#39;, &#39;helping&#39;, &#39;organize&#39;,</span>
0760 <span class="go">&#39;keeps&#39;, &#39;organizing&#39;, &#39;local&#39;, &#39;meetups&#39;, &#39;internal&#39;,</span>
0761 <span class="go">&#39;talks&#39;, &#39;workplace&#39;, &#39;presenting&#39;, &#39;introduce&#39;, &#39;reader&#39;,</span>
0762 <span class="go">&#39;Use&#39;, &#39;cases&#39;, &#39;Apart&#39;, &#39;work&#39;, &#39;passionate&#39;, &#39;music&#39;, &#39;play&#39;,</span>
0763 <span class="go">&#39;enrolled&#39;, &#39;weekend&#39;, &#39;batch&#39;, &#39;situated&#39;, &#39;Mayfair&#39;, &#39;City&#39;,</span>
0764 <span class="go">&#39;world&#39;, &#39;class&#39;, &#39;piano&#39;, &#39;instructors&#39;]</span>
0765 </pre></div>
0766 
0767 <p>By looking at the common words, you can see that the text as a whole is probably about <code>Gus</code>, <code>London</code>, or <code>Natural Language Processing</code>. This way, you can take any unstructured text and perform statistical analysis to know what it&rsquo;s about.</p>
0768 <p>Here&rsquo;s another example of the same text with stop words:</p>
0769 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">words_all</span> <span class="o">=</span> <span class="p">[</span><span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">complete_doc</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">token</span><span class="o">.</span><span class="n">is_punct</span><span class="p">]</span>
0770 <span class="gp">&gt;&gt;&gt; </span><span class="n">word_freq_all</span> <span class="o">=</span> <span class="n">Counter</span><span class="p">(</span><span class="n">words_all</span><span class="p">)</span>
0771 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># 5 commonly occurring words with their frequencies</span>
0772 <span class="gp">&gt;&gt;&gt; </span><span class="n">common_words_all</span> <span class="o">=</span> <span class="n">word_freq_all</span><span class="o">.</span><span class="n">most_common</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
0773 <span class="gp">&gt;&gt;&gt; </span><span class="nb">print</span> <span class="p">(</span><span class="n">common_words_all</span><span class="p">)</span>
0774 <span class="go">[(&#39;is&#39;, 10), (&#39;a&#39;, 5), (&#39;in&#39;, 5), (&#39;Gus&#39;, 4), (&#39;of&#39;, 4)]</span>
0775 </pre></div>
0776 
0777 <p>Four out of five of the most common words are stop words, which don&rsquo;t tell you much about the text. If you consider stop words while doing word frequency analysis, then you won&rsquo;t be able to derive meaningful insights from the input text. This is why removing stop words is so important.</p>
0778 <h2 id="part-of-speech-tagging">Part of Speech Tagging</h2>
0779 <p><strong>Part of speech</strong> or <strong>POS</strong> is a grammatical role that explains how a particular word is used in a sentence. There are eight parts of speech:</p>
0780 <ol>
0781 <li>Noun</li>
0782 <li>Pronoun</li>
0783 <li>Adjective</li>
0784 <li>Verb</li>
0785 <li>Adverb</li>
0786 <li>Preposition</li>
0787 <li>Conjunction</li>
0788 <li>Interjection</li>
0789 </ol>
0790 <p><strong>Part of speech tagging</strong> is the process of assigning a <strong>POS tag</strong> to each token depending on its usage in the sentence. POS tags are useful for assigning a syntactic category like <strong>noun</strong> or <strong>verb</strong> to each word.</p>
0791 <p>In spaCy, POS tags are available as an attribute on the <code>Token</code> object:</p>
0792 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">about_doc</span><span class="p">:</span>
0793 <span class="gp">... </span>    <span class="nb">print</span> <span class="p">(</span><span class="n">token</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">tag_</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">pos_</span><span class="p">,</span> <span class="n">spacy</span><span class="o">.</span><span class="n">explain</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">tag_</span><span class="p">))</span>
0794 <span class="gp">...</span>
0795 <span class="go">Gus NNP PROPN noun, proper singular</span>
0796 <span class="go">Proto NNP PROPN noun, proper singular</span>
0797 <span class="go">is VBZ VERB verb, 3rd person singular present</span>
0798 <span class="go">a DT DET determiner</span>
0799 <span class="go">Python NNP PROPN noun, proper singular</span>
0800 <span class="go">developer NN NOUN noun, singular or mass</span>
0801 <span class="go">currently RB ADV adverb</span>
0802 <span class="go">working VBG VERB verb, gerund or present participle</span>
0803 <span class="go">for IN ADP conjunction, subordinating or preposition</span>
0804 <span class="go">a DT DET determiner</span>
0805 <span class="go">London NNP PROPN noun, proper singular</span>
0806 <span class="go">- HYPH PUNCT punctuation mark, hyphen</span>
0807 <span class="go">based VBN VERB verb, past participle</span>
0808 <span class="go">Fintech NNP PROPN noun, proper singular</span>
0809 <span class="go">company NN NOUN noun, singular or mass</span>
0810 <span class="go">. . PUNCT punctuation mark, sentence closer</span>
0811 <span class="go">He PRP PRON pronoun, personal</span>
0812 <span class="go">is VBZ VERB verb, 3rd person singular present</span>
0813 <span class="go">interested JJ ADJ adjective</span>
0814 <span class="go">in IN ADP conjunction, subordinating or preposition</span>
0815 <span class="go">learning VBG VERB verb, gerund or present participle</span>
0816 <span class="go">Natural NNP PROPN noun, proper singular</span>
0817 <span class="go">Language NNP PROPN noun, proper singular</span>
0818 <span class="go">Processing NNP PROPN noun, proper singular</span>
0819 <span class="go">. . PUNCT punctuation mark, sentence closer</span>
0820 </pre></div>
0821 
0822 <p>Here, two attributes of the <code>Token</code> class are accessed:</p>
0823 <ol>
0824 <li><strong><code>tag_</code></strong> lists the fine-grained part of speech.</li>
0825 <li><strong><code>pos_</code></strong> lists the coarse-grained part of speech.</li>
0826 </ol>
0827 <p><code>spacy.explain</code> gives descriptive details about a particular POS tag. spaCy provides a <a href="https://spaCy.io/api/annotation#pos-tagging">complete tag list</a> along with an explanation for each tag.</p>
0828 <p>Using POS tags, you can extract a particular category of words:</p>
0829 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">nouns</span> <span class="o">=</span> <span class="p">[]</span>
0830 <span class="gp">&gt;&gt;&gt; </span><span class="n">adjectives</span> <span class="o">=</span> <span class="p">[]</span>
0831 <span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">about_doc</span><span class="p">:</span>
0832 <span class="gp">... </span>    <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">pos_</span> <span class="o">==</span> <span class="s1">&#39;NOUN&#39;</span><span class="p">:</span>
0833 <span class="gp">... </span>        <span class="n">nouns</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
0834 <span class="gp">... </span>    <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">pos_</span> <span class="o">==</span> <span class="s1">&#39;ADJ&#39;</span><span class="p">:</span>
0835 <span class="gp">... </span>        <span class="n">adjectives</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
0836 <span class="gp">...</span>
0837 <span class="gp">&gt;&gt;&gt; </span><span class="n">nouns</span>
0838 <span class="go">[developer, company]</span>
0839 <span class="gp">&gt;&gt;&gt; </span><span class="n">adjectives</span>
0840 <span class="go">[interested]</span>
0841 </pre></div>
0842 
0843 <p>You can use this to derive insights, remove the most common nouns, or see which adjectives are used for a particular noun.</p>
0844 <h2 id="visualization-using-displacy">Visualization: Using displaCy</h2>
0845 <p>spaCy comes with a built-in visualizer called <strong>displaCy</strong>. You can use it to visualize a <strong>dependency parse</strong> or <strong>named entities</strong> in a browser or a <a href="https://realpython.com/jupyter-notebook-introduction/">Jupyter notebook</a>.</p>
0846 <p>You can use displaCy to find POS tags for tokens:</p>
0847 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">spacy</span> <span class="k">import</span> <span class="n">displacy</span>
0848 <span class="gp">&gt;&gt;&gt; </span><span class="n">about_interest_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">&#39;He is interested in learning&#39;</span>
0849 <span class="gp">... </span>    <span class="s1">&#39; Natural Language Processing.&#39;</span><span class="p">)</span>
0850 <span class="gp">&gt;&gt;&gt; </span><span class="n">about_interest_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">about_interest_text</span><span class="p">)</span>
0851 <span class="gp">&gt;&gt;&gt; </span><span class="n">displacy</span><span class="o">.</span><span class="n">serve</span><span class="p">(</span><span class="n">about_interest_doc</span><span class="p">,</span> <span class="n">style</span><span class="o">=</span><span class="s1">&#39;dep&#39;</span><span class="p">)</span>
0852 </pre></div>
0853 
0854 <p>The above code will spin a simple web server. You can see the visualization by opening <a href="http://127.0.0.1:5000">http://127.0.0.1:5000</a> in your browser:</p>
0855 <figure class="figure mx-auto d-block"><a href="https://files.realpython.com/media/displacy_pos_tags.45059f2bf851.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/displacy_pos_tags.45059f2bf851.png" width="2630" height="600" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/displacy_pos_tags.45059f2bf851.png&amp;w=657&amp;sig=a49d6b5a0e5952aea59c0241f61fb09440bb326b 657w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/displacy_pos_tags.45059f2bf851.png&amp;w=1315&amp;sig=858218e45ae1e23a87ad42204154aeae77c9cc0c 1315w, https://files.realpython.com/media/displacy_pos_tags.45059f2bf851.png 2630w" sizes="75vw" alt="Displacy: Part of Speech Tagging Demo"/></a><figcaption class="figure-caption text-center">displaCy: Part of Speech Tagging Demo</figcaption></figure>
0856 
0857 <p>In the image above, each token is assigned a POS tag written just below the token.</p>
0858 <div class="alert alert-primary" role="alert">
0859 <p><strong>Note:</strong> Here&rsquo;s how you can use displaCy in a Jupyter notebook:</p>
0860 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">displacy</span><span class="o">.</span><span class="n">render</span><span class="p">(</span><span class="n">about_interest_doc</span><span class="p">,</span> <span class="n">style</span><span class="o">=</span><span class="s1">&#39;dep&#39;</span><span class="p">,</span> <span class="n">jupyter</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
0861 </pre></div>
0862 
0863 </div>
0864 <h2 id="preprocessing-functions">Preprocessing Functions</h2>
0865 <p>You can create a <strong>preprocessing function</strong> that takes text as input and applies the following operations:</p>
0866 <ul>
0867 <li>Lowercases the text</li>
0868 <li>Lemmatizes each token</li>
0869 <li>Removes punctuation symbols</li>
0870 <li>Removes stop words</li>
0871 </ul>
0872 <p>A preprocessing function converts text to an analyzable format. It&rsquo;s necessary for most NLP tasks. Here&rsquo;s an example:</p>
0873 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">def</span> <span class="nf">is_token_allowed</span><span class="p">(</span><span class="n">token</span><span class="p">):</span>
0874 <span class="gp">... </span>    <span class="sd">&#39;&#39;&#39;</span>
0875 <span class="gp">... </span><span class="sd">        Only allow valid tokens which are not stop words</span>
0876 <span class="gp">... </span><span class="sd">        and punctuation symbols.</span>
0877 <span class="gp">... </span><span class="sd">    &#39;&#39;&#39;</span>
0878 <span class="gp">... </span>    <span class="k">if</span> <span class="p">(</span><span class="ow">not</span> <span class="n">token</span> <span class="ow">or</span> <span class="ow">not</span> <span class="n">token</span><span class="o">.</span><span class="n">string</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="ow">or</span>
0879 <span class="gp">... </span>        <span class="n">token</span><span class="o">.</span><span class="n">is_stop</span> <span class="ow">or</span> <span class="n">token</span><span class="o">.</span><span class="n">is_punct</span><span class="p">):</span>
0880 <span class="gp">... </span>        <span class="k">return</span> <span class="kc">False</span>
0881 <span class="gp">... </span>    <span class="k">return</span> <span class="kc">True</span>
0882 <span class="gp">...</span>
0883 <span class="gp">&gt;&gt;&gt; </span><span class="k">def</span> <span class="nf">preprocess_token</span><span class="p">(</span><span class="n">token</span><span class="p">):</span>
0884 <span class="gp">... </span>    <span class="c1"># Reduce token to its lowercase lemma form</span>
0885 <span class="gp">... </span>    <span class="k">return</span> <span class="n">token</span><span class="o">.</span><span class="n">lemma_</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span>
0886 <span class="gp">...</span>
0887 <span class="gp">&gt;&gt;&gt; </span><span class="n">complete_filtered_tokens</span> <span class="o">=</span> <span class="p">[</span><span class="n">preprocess_token</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
0888 <span class="gp">... </span>    <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">complete_doc</span> <span class="k">if</span> <span class="n">is_token_allowed</span><span class="p">(</span><span class="n">token</span><span class="p">)]</span>
0889 <span class="gp">&gt;&gt;&gt; </span><span class="n">complete_filtered_tokens</span>
0890 <span class="go">[&#39;gus&#39;, &#39;proto&#39;, &#39;python&#39;, &#39;developer&#39;, &#39;currently&#39;, &#39;work&#39;,</span>
0891 <span class="go">&#39;london&#39;, &#39;base&#39;, &#39;fintech&#39;, &#39;company&#39;, &#39;interested&#39;, &#39;learn&#39;,</span>
0892 <span class="go">&#39;natural&#39;, &#39;language&#39;, &#39;processing&#39;, &#39;developer&#39;, &#39;conference&#39;,</span>
0893 <span class="go">&#39;happen&#39;, &#39;21&#39;, &#39;july&#39;, &#39;2019&#39;, &#39;london&#39;, &#39;title&#39;,</span>
0894 <span class="go">&#39;applications&#39;, &#39;natural&#39;, &#39;language&#39;, &#39;processing&#39;, &#39;helpline&#39;,</span>
0895 <span class="go">&#39;number&#39;, &#39;available&#39;, &#39;+1&#39;, &#39;1234567891&#39;, &#39;gus&#39;, &#39;help&#39;,</span>
0896 <span class="go">&#39;organize&#39;, &#39;keep&#39;, &#39;organize&#39;, &#39;local&#39;, &#39;python&#39;, &#39;meetup&#39;,</span>
0897 <span class="go">&#39;internal&#39;, &#39;talk&#39;, &#39;workplace&#39;, &#39;gus&#39;, &#39;present&#39;, &#39;talk&#39;, &#39;talk&#39;,</span>
0898 <span class="go">&#39;introduce&#39;, &#39;reader&#39;, &#39;use&#39;, &#39;case&#39;, &#39;natural&#39;, &#39;language&#39;,</span>
0899 <span class="go">&#39;processing&#39;, &#39;fintech&#39;, &#39;apart&#39;, &#39;work&#39;, &#39;passionate&#39;, &#39;music&#39;,</span>
0900 <span class="go">&#39;gus&#39;, &#39;learn&#39;, &#39;play&#39;, &#39;piano&#39;, &#39;enrol&#39;, &#39;weekend&#39;, &#39;batch&#39;,</span>
0901 <span class="go">&#39;great&#39;, &#39;piano&#39;, &#39;academy&#39;, &#39;great&#39;, &#39;piano&#39;, &#39;academy&#39;,</span>
0902 <span class="go">&#39;situate&#39;, &#39;mayfair&#39;, &#39;city&#39;, &#39;london&#39;, &#39;world&#39;, &#39;class&#39;,</span>
0903 <span class="go">&#39;piano&#39;, &#39;instructor&#39;]</span>
0904 </pre></div>
0905 
0906 <p>Note that the <code>complete_filtered_tokens</code> does not contain any stop word or punctuation symbols and consists of lemmatized lowercase tokens.</p>
0907 <h2 id="rule-based-matching-using-spacy">Rule-Based Matching Using spaCy</h2>
0908 <p><strong>Rule-based matching</strong> is one of the steps in extracting information from unstructured text. It&rsquo;s used to identify and extract tokens and phrases according to patterns (such as lowercase) and grammatical features (such as part of speech).</p>
0909 <p>Rule-based matching can use <a href="https://en.wikipedia.org/wiki/Regular_expression">regular expressions</a> to extract entities (such as phone numbers) from an unstructured text. It&rsquo;s different from extracting text using regular expressions only in the sense that regular expressions don&rsquo;t consider the lexical and grammatical attributes of the text.</p>
0910 <p>With rule-based matching, you can extract a first name and a last name, which are always <strong>proper nouns</strong>:</p>
0911 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">spacy.matcher</span> <span class="k">import</span> <span class="n">Matcher</span>
0912 <span class="gp">&gt;&gt;&gt; </span><span class="n">matcher</span> <span class="o">=</span> <span class="n">Matcher</span><span class="p">(</span><span class="n">nlp</span><span class="o">.</span><span class="n">vocab</span><span class="p">)</span>
0913 <span class="gp">&gt;&gt;&gt; </span><span class="k">def</span> <span class="nf">extract_full_name</span><span class="p">(</span><span class="n">nlp_doc</span><span class="p">):</span>
0914 <span class="gp">... </span>    <span class="n">pattern</span> <span class="o">=</span> <span class="p">[{</span><span class="s1">&#39;POS&#39;</span><span class="p">:</span> <span class="s1">&#39;PROPN&#39;</span><span class="p">},</span> <span class="p">{</span><span class="s1">&#39;POS&#39;</span><span class="p">:</span> <span class="s1">&#39;PROPN&#39;</span><span class="p">}]</span>
0915 <span class="gp">... </span>    <span class="n">matcher</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s1">&#39;FULL_NAME&#39;</span><span class="p">,</span> <span class="kc">None</span><span class="p">,</span> <span class="n">pattern</span><span class="p">)</span>
0916 <span class="gp">... </span>    <span class="n">matches</span> <span class="o">=</span> <span class="n">matcher</span><span class="p">(</span><span class="n">nlp_doc</span><span class="p">)</span>
0917 <span class="gp">... </span>    <span class="k">for</span> <span class="n">match_id</span><span class="p">,</span> <span class="n">start</span><span class="p">,</span> <span class="n">end</span> <span class="ow">in</span> <span class="n">matches</span><span class="p">:</span>
0918 <span class="gp">... </span>        <span class="n">span</span> <span class="o">=</span> <span class="n">nlp_doc</span><span class="p">[</span><span class="n">start</span><span class="p">:</span><span class="n">end</span><span class="p">]</span>
0919 <span class="gp">... </span>        <span class="k">return</span> <span class="n">span</span><span class="o">.</span><span class="n">text</span>
0920 <span class="gp">...</span>
0921 <span class="gp">&gt;&gt;&gt; </span><span class="n">extract_full_name</span><span class="p">(</span><span class="n">about_doc</span><span class="p">)</span>
0922 <span class="go">&#39;Gus Proto&#39;</span>
0923 </pre></div>
0924 
0925 <p>In this example, <code>pattern</code> is a list of objects that defines the combination of tokens to be matched. Both POS tags in it are <code>PROPN</code> (proper noun). So, the <code>pattern</code> consists of two objects in which the POS tags for both tokens should be <code>PROPN</code>. This pattern is then added to <code>Matcher</code> using <code>FULL_NAME</code> and the the <code>match_id</code>. Finally, matches are obtained with their starting and end indexes.</p>
0926 <p>You can also use rule-based matching to extract phone numbers:</p>
0927 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">spacy.matcher</span> <span class="k">import</span> <span class="n">Matcher</span>
0928 <span class="gp">&gt;&gt;&gt; </span><span class="n">matcher</span> <span class="o">=</span> <span class="n">Matcher</span><span class="p">(</span><span class="n">nlp</span><span class="o">.</span><span class="n">vocab</span><span class="p">)</span>
0929 <span class="gp">&gt;&gt;&gt; </span><span class="n">conference_org_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">&#39;There is a developer conference&#39;</span>
0930 <span class="gp">... </span>    <span class="s1">&#39;happening on 21 July 2019 in London. It is titled&#39;</span>
0931 <span class="gp">... </span>    <span class="s1">&#39; &quot;Applications of Natural Language Processing&quot;.&#39;</span>
0932 <span class="gp">... </span>    <span class="s1">&#39; There is a helpline number available&#39;</span>
0933 <span class="gp">... </span>    <span class="s1">&#39; at (123) 456-789&#39;</span><span class="p">)</span>
0934 <span class="gp">...</span>
0935 <span class="gp">&gt;&gt;&gt; </span><span class="k">def</span> <span class="nf">extract_phone_number</span><span class="p">(</span><span class="n">nlp_doc</span><span class="p">):</span>
0936 <span class="gp">... </span>    <span class="n">pattern</span> <span class="o">=</span> <span class="p">[{</span><span class="s1">&#39;ORTH&#39;</span><span class="p">:</span> <span class="s1">&#39;(&#39;</span><span class="p">},</span> <span class="p">{</span><span class="s1">&#39;SHAPE&#39;</span><span class="p">:</span> <span class="s1">&#39;ddd&#39;</span><span class="p">},</span>
0937 <span class="gp">... </span>               <span class="p">{</span><span class="s1">&#39;ORTH&#39;</span><span class="p">:</span> <span class="s1">&#39;)&#39;</span><span class="p">},</span> <span class="p">{</span><span class="s1">&#39;SHAPE&#39;</span><span class="p">:</span> <span class="s1">&#39;ddd&#39;</span><span class="p">},</span>
0938 <span class="gp">... </span>               <span class="p">{</span><span class="s1">&#39;ORTH&#39;</span><span class="p">:</span> <span class="s1">&#39;-&#39;</span><span class="p">,</span> <span class="s1">&#39;OP&#39;</span><span class="p">:</span> <span class="s1">&#39;?&#39;</span><span class="p">},</span>
0939 <span class="gp">... </span>               <span class="p">{</span><span class="s1">&#39;SHAPE&#39;</span><span class="p">:</span> <span class="s1">&#39;ddd&#39;</span><span class="p">}]</span>
0940 <span class="gp">... </span>    <span class="n">matcher</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s1">&#39;PHONE_NUMBER&#39;</span><span class="p">,</span> <span class="kc">None</span><span class="p">,</span> <span class="n">pattern</span><span class="p">)</span>
0941 <span class="gp">... </span>    <span class="n">matches</span> <span class="o">=</span> <span class="n">matcher</span><span class="p">(</span><span class="n">nlp_doc</span><span class="p">)</span>
0942 <span class="gp">... </span>    <span class="k">for</span> <span class="n">match_id</span><span class="p">,</span> <span class="n">start</span><span class="p">,</span> <span class="n">end</span> <span class="ow">in</span> <span class="n">matches</span><span class="p">:</span>
0943 <span class="gp">... </span>        <span class="n">span</span> <span class="o">=</span> <span class="n">nlp_doc</span><span class="p">[</span><span class="n">start</span><span class="p">:</span><span class="n">end</span><span class="p">]</span>
0944 <span class="gp">... </span>        <span class="k">return</span> <span class="n">span</span><span class="o">.</span><span class="n">text</span>
0945 <span class="gp">...</span>
0946 <span class="gp">&gt;&gt;&gt; </span><span class="n">conference_org_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">conference_org_text</span><span class="p">)</span>
0947 <span class="gp">&gt;&gt;&gt; </span><span class="n">extract_phone_number</span><span class="p">(</span><span class="n">conference_org_doc</span><span class="p">)</span>
0948 <span class="go">&#39;(123) 456-789&#39;</span>
0949 </pre></div>
0950 
0951 <p>In this example, only the pattern is updated in order to match phone numbers from the previous example. Here, some attributes of the token are also used:</p>
0952 <ul>
0953 <li><strong><code>ORTH</code></strong> gives the exact text of the token.</li>
0954 <li><strong><code>SHAPE</code></strong> transforms the token string to show orthographic features.</li>
0955 <li><strong><code>OP</code></strong> defines operators. Using <code>?</code> as a value means that the pattern is optional, meaning it can match 0 or 1 times.</li>
0956 </ul>
0957 <div class="alert alert-primary" role="alert">
0958 <p><strong>Note:</strong> For simplicity, phone numbers are assumed to be of a particular format: <code>(123) 456-789</code>. You can change this depending on your use case.</p>
0959 </div>
0960 <p>Rule-based matching helps you identify and extract tokens and phrases according to lexical patterns (such as lowercase) and grammatical features(such as part of speech).</p>
0961 <h2 id="dependency-parsing-using-spacy">Dependency Parsing Using spaCy</h2>
0962 <p><strong>Dependency parsing</strong> is the process of extracting the dependency parse of a sentence to represent its grammatical structure. It defines the dependency relationship between <strong>headwords</strong> and their <strong>dependents</strong>. The head of a sentence has no dependency and is called the <strong>root of the sentence</strong>. The <strong>verb</strong> is usually the head of the sentence. All other words are linked to the headword.</p>
0963 <p>The dependencies can be mapped in a directed graph representation: </p>
0964 <ul>
0965 <li>Words are the nodes.</li>
0966 <li>The grammatical relationships are the edges.</li>
0967 </ul>
0968 <p>Dependency parsing helps you know what role a word plays in the text and how different words relate to each other. It&rsquo;s also used in <strong>shallow parsing</strong> and named entity recognition.</p>
0969 <p>Here&rsquo;s how you can use dependency parsing to see the relationships between words:</p>
0970 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">piano_text</span> <span class="o">=</span> <span class="s1">&#39;Gus is learning piano&#39;</span>
0971 <span class="gp">&gt;&gt;&gt; </span><span class="n">piano_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">piano_text</span><span class="p">)</span>
0972 <span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">piano_doc</span><span class="p">:</span>
0973 <span class="gp">... </span>    <span class="nb">print</span> <span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">text</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">tag_</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">head</span><span class="o">.</span><span class="n">text</span><span class="p">,</span> <span class="n">token</span><span class="o">.</span><span class="n">dep_</span><span class="p">)</span>
0974 <span class="gp">...</span>
0975 <span class="go">Gus NNP learning nsubj</span>
0976 <span class="go">is VBZ learning aux</span>
0977 <span class="go">learning VBG learning ROOT</span>
0978 <span class="go">piano NN learning dobj</span>
0979 </pre></div>
0980 
0981 <p>In this example, the sentence contains three relationships:</p>
0982 <ol>
0983 <li><strong><code>nsubj</code></strong> is the subject of the word. Its headword is a verb.</li>
0984 <li><strong><code>aux</code></strong> is an auxiliary word. Its headword is a verb.</li>
0985 <li><strong><code>dobj</code></strong> is the direct object of the verb. Its headword is a verb.</li>
0986 </ol>
0987 <p>There is a detailed <a href="https://nlp.stanford.edu/software/dependencies_manual.pdf">list of relationships</a> with descriptions. You can use displaCy to visualize the dependency tree:</p>
0988 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">displacy</span><span class="o">.</span><span class="n">serve</span><span class="p">(</span><span class="n">piano_doc</span><span class="p">,</span> <span class="n">style</span><span class="o">=</span><span class="s1">&#39;dep&#39;</span><span class="p">)</span>
0989 </pre></div>
0990 
0991 <p>This code will produce a visualization that can be accessed by opening <a href="http://127.0.0.1:5000">http://127.0.0.1:5000</a> in your browser:</p>
0992 <figure class="figure mx-auto d-block"><a href="https://files.realpython.com/media/displacy_dependency_parse.de72f9b1d115.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/displacy_dependency_parse.de72f9b1d115.png" width="1278" height="596" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/displacy_dependency_parse.de72f9b1d115.png&amp;w=319&amp;sig=111728c07cf2e1f64b8419cfce8a5f880c244d03 319w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/displacy_dependency_parse.de72f9b1d115.png&amp;w=639&amp;sig=f90a72529d7bc2d2dd944af3c02bbf487b65aaf3 639w, https://files.realpython.com/media/displacy_dependency_parse.de72f9b1d115.png 1278w" sizes="75vw" alt="Displacy: Dependency Parse Demo"/></a><figcaption class="figure-caption text-center">displaCy: Dependency Parse Demo</figcaption></figure>
0993 
0994 <p>This image shows you that the subject of the sentence is the proper noun <code>Gus</code> and that it has a <code>learn</code> relationship with <code>piano</code>.</p>
0995 <h2 id="navigating-the-tree-and-subtree">Navigating the Tree and Subtree</h2>
0996 <p>The dependency parse tree has all the properties of a <a href="https://en.wikipedia.org/wiki/Tree_(data_structure)">tree</a>. This tree contains information about sentence structure and grammar and can be traversed in different ways to extract relationships.</p>
0997 <p>spaCy provides attributes like <code>children</code>, <code>lefts</code>, <code>rights</code>, and <code>subtree</code> to navigate the parse tree:</p>
0998 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">one_line_about_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">&#39;Gus Proto is a Python developer&#39;</span>
0999 <span class="gp">... </span>    <span class="s1">&#39; currently working for a London-based Fintech company&#39;</span><span class="p">)</span>
1000 <span class="gp">&gt;&gt;&gt; </span><span class="n">one_line_about_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">one_line_about_text</span><span class="p">)</span>
1001 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Extract children of `developer`</span>
1002 <span class="gp">&gt;&gt;&gt; </span><span class="nb">print</span><span class="p">([</span><span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">one_line_about_doc</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span><span class="o">.</span><span class="n">children</span><span class="p">])</span>
1003 <span class="go">[&#39;a&#39;, &#39;Python&#39;, &#39;working&#39;]</span>
1004 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Extract previous neighboring node of `developer`</span>
1005 <span class="gp">&gt;&gt;&gt; </span><span class="nb">print</span> <span class="p">(</span><span class="n">one_line_about_doc</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span><span class="o">.</span><span class="n">nbor</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">))</span>
1006 <span class="go">Python</span>
1007 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Extract next neighboring node of `developer`</span>
1008 <span class="gp">&gt;&gt;&gt; </span><span class="nb">print</span> <span class="p">(</span><span class="n">one_line_about_doc</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span><span class="o">.</span><span class="n">nbor</span><span class="p">())</span>
1009 <span class="go">currently</span>
1010 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Extract all tokens on the left of `developer`</span>
1011 <span class="gp">&gt;&gt;&gt; </span><span class="nb">print</span><span class="p">([</span><span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">one_line_about_doc</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span><span class="o">.</span><span class="n">lefts</span><span class="p">])</span>
1012 <span class="go">[&#39;a&#39;, &#39;Python&#39;]</span>
1013 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Extract tokens on the right of `developer`</span>
1014 <span class="gp">&gt;&gt;&gt; </span><span class="nb">print</span><span class="p">([</span><span class="n">token</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">one_line_about_doc</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span><span class="o">.</span><span class="n">rights</span><span class="p">])</span>
1015 <span class="go">[&#39;working&#39;]</span>
1016 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Print subtree of `developer`</span>
1017 <span class="gp">&gt;&gt;&gt; </span><span class="nb">print</span> <span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">one_line_about_doc</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span><span class="o">.</span><span class="n">subtree</span><span class="p">))</span>
1018 <span class="go">[a, Python, developer, currently, working, for, a, London, -,</span>
1019 <span class="go">based, Fintech, company]</span>
1020 </pre></div>
1021 
1022 <p>You can construct a function that takes a subtree as an argument and returns a string by merging words in it:</p>
1023 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">def</span> <span class="nf">flatten_tree</span><span class="p">(</span><span class="n">tree</span><span class="p">):</span>
1024 <span class="gp">... </span>    <span class="k">return</span> <span class="s1">&#39;&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="n">token</span><span class="o">.</span><span class="n">text_with_ws</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="nb">list</span><span class="p">(</span><span class="n">tree</span><span class="p">)])</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
1025 <span class="gp">...</span>
1026 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Print flattened subtree of `developer`</span>
1027 <span class="gp">&gt;&gt;&gt; </span><span class="nb">print</span> <span class="p">(</span><span class="n">flatten_tree</span><span class="p">(</span><span class="n">one_line_about_doc</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span><span class="o">.</span><span class="n">subtree</span><span class="p">))</span>
1028 <span class="go">a Python developer currently working for a London-based Fintech company</span>
1029 </pre></div>
1030 
1031 <p>You can use this function to print all the tokens in a subtree.</p>
1032 <h2 id="shallow-parsing">Shallow Parsing</h2>
1033 <p><strong>Shallow parsing</strong>, or <strong>chunking</strong>, is the process of extracting phrases from unstructured text. Chunking groups adjacent tokens into phrases on the basis of their POS tags. There are some standard well-known chunks such as noun phrases, verb phrases, and prepositional phrases.</p>
1034 <h3 id="noun-phrase-detection">Noun Phrase Detection</h3>
1035 <p>A noun phrase is a phrase that has a noun as its head. It could also include other kinds of words, such as adjectives, ordinals, determiners. Noun phrases are useful for explaining the context of the sentence. They help you infer <em>what</em> is being talked about in the sentence.</p>
1036 <p>spaCy has the property <code>noun_chunks</code> on <code>Doc</code> object. You can use it to extract noun phrases:</p>
1037 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">conference_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">&#39;There is a developer conference&#39;</span>
1038 <span class="gp">... </span>    <span class="s1">&#39; happening on 21 July 2019 in London.&#39;</span><span class="p">)</span>
1039 <span class="gp">&gt;&gt;&gt; </span><span class="n">conference_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">conference_text</span><span class="p">)</span>
1040 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Extract Noun Phrases</span>
1041 <span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">chunk</span> <span class="ow">in</span> <span class="n">conference_doc</span><span class="o">.</span><span class="n">noun_chunks</span><span class="p">:</span>
1042 <span class="gp">... </span>    <span class="nb">print</span> <span class="p">(</span><span class="n">chunk</span><span class="p">)</span>
1043 <span class="gp">...</span>
1044 <span class="go">a developer conference</span>
1045 <span class="go">21 July</span>
1046 <span class="go">London</span>
1047 </pre></div>
1048 
1049 <p>By looking at noun phrases, you can get information about your text. For example, <code>a developer conference</code> indicates that the text mentions a conference, while the date <code>21 July</code> lets you know that conference is scheduled for <code>21 July</code>. You can figure out whether the conference is in the past or the future. <code>London</code> tells you that the conference is in <code>London</code>.</p>
1050 <h3 id="verb-phrase-detection">Verb Phrase Detection</h3>
1051 <p>A <strong>verb phrase</strong> is a syntactic unit composed of at least one verb. This verb can be followed by other chunks, such as noun phrases. Verb phrases are useful for understanding the actions that nouns are involved in. </p>
1052 <p>spaCy has no built-in functionality to extract verb phrases, so you&rsquo;ll need a library called <a href="https://chartbeat-labs.github.io/textacy/"><code>textacy</code></a>:</p>
1053 <div class="alert alert-primary" role="alert">
1054 <p><strong>Note:</strong> </p>
1055 <p>You can use <code>pip</code> to install <code>textacy</code>:</p>
1056 <div class="highlight sh"><pre><span></span><span class="gp">$</span> pip install textacy
1057 </pre></div>
1058 
1059 </div>
1060 <p>Now that you have <code>textacy</code> installed, you can use it to extract verb phrases based on grammar rules:</p>
1061 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">textacy</span>
1062 <span class="gp">&gt;&gt;&gt; </span><span class="n">about_talk_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">&#39;The talk will introduce reader about Use&#39;</span>
1063 <span class="gp">... </span>                   <span class="s1">&#39; cases of Natural Language Processing in&#39;</span>
1064 <span class="gp">... </span>                   <span class="s1">&#39; Fintech&#39;</span><span class="p">)</span>
1065 <span class="gp">&gt;&gt;&gt; </span><span class="n">pattern</span> <span class="o">=</span> <span class="sa">r</span><span class="s1">&#39;(&lt;VERB&gt;?&lt;ADV&gt;*&lt;VERB&gt;+)&#39;</span>
1066 <span class="gp">&gt;&gt;&gt; </span><span class="n">about_talk_doc</span> <span class="o">=</span> <span class="n">textacy</span><span class="o">.</span><span class="n">make_spacy_doc</span><span class="p">(</span><span class="n">about_talk_text</span><span class="p">,</span>
1067 <span class="gp">... </span>                                        <span class="n">lang</span><span class="o">=</span><span class="s1">&#39;en_core_web_sm&#39;</span><span class="p">)</span>
1068 <span class="gp">&gt;&gt;&gt; </span><span class="n">verb_phrases</span> <span class="o">=</span> <span class="n">textacy</span><span class="o">.</span><span class="n">extract</span><span class="o">.</span><span class="n">pos_regex_matches</span><span class="p">(</span><span class="n">about_talk_doc</span><span class="p">,</span> <span class="n">pattern</span><span class="p">)</span>
1069 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Print all Verb Phrase</span>
1070 <span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">chunk</span> <span class="ow">in</span> <span class="n">verb_phrases</span><span class="p">:</span>
1071 <span class="gp">... </span>    <span class="nb">print</span><span class="p">(</span><span class="n">chunk</span><span class="o">.</span><span class="n">text</span><span class="p">)</span>
1072 <span class="gp">...</span>
1073 <span class="go">will introduce</span>
1074 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Extract Noun Phrase to explain what nouns are involved</span>
1075 <span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">chunk</span> <span class="ow">in</span> <span class="n">about_talk_doc</span><span class="o">.</span><span class="n">noun_chunks</span><span class="p">:</span>
1076 <span class="gp">... </span>    <span class="nb">print</span> <span class="p">(</span><span class="n">chunk</span><span class="p">)</span>
1077 <span class="gp">...</span>
1078 <span class="go">The talk</span>
1079 <span class="go">reader</span>
1080 <span class="go">Use cases</span>
1081 <span class="go">Natural Language Processing</span>
1082 <span class="go">Fintech</span>
1083 </pre></div>
1084 
1085 <p>In this example, the verb phrase <code>introduce</code> indicates that something will be introduced. By looking at noun phrases, you can see that there is a <code>talk</code> that will <code>introduce</code> the <code>reader</code> to <code>use cases</code> of <code>Natural Language Processing</code> or <code>Fintech</code>.</p>
1086 <p>The above code extracts all the verb phrases <a href="https://chartbeat-labs.github.io/textacy/api_reference/information_extraction.html?highlight=pos#textacy.extract.pos_regex_matches">using a regular expression pattern</a> of POS tags. You can tweak the pattern for verb phrases depending upon your use case.</p>
1087 <div class="alert alert-primary" role="alert">
1088 <p><strong>Note:</strong> In the previous example, you could have also done dependency parsing to see what the <a href="https://nlp.stanford.edu/software/dependencies_manual.pdf">relationships</a> between the words were.</p>
1089 </div>
1090 <h2 id="named-entity-recognition">Named Entity Recognition</h2>
1091 <p><strong>Named Entity Recognition</strong> (NER) is the process of locating <strong>named entities</strong> in unstructured text and then classifying them into pre-defined categories, such as person names, organizations, locations, monetary values, percentages, time expressions, and so on.</p>
1092 <p>You can use <strong>NER</strong> to know more about the meaning of your text. For example, you could use it to populate tags for a set of documents in order to improve the keyword search. You could also use it to categorize customer support tickets into relevant categories.</p>
1093 <p>spaCy has the property <code>ents</code> on <code>Doc</code> objects. You can use it to extract named entities:</p>
1094 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">piano_class_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">&#39;Great Piano Academy is situated&#39;</span>
1095 <span class="gp">... </span>    <span class="s1">&#39; in Mayfair or the City of London and has&#39;</span>
1096 <span class="gp">... </span>    <span class="s1">&#39; world-class piano instructors.&#39;</span><span class="p">)</span>
1097 <span class="gp">&gt;&gt;&gt; </span><span class="n">piano_class_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">piano_class_text</span><span class="p">)</span>
1098 <span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">ent</span> <span class="ow">in</span> <span class="n">piano_class_doc</span><span class="o">.</span><span class="n">ents</span><span class="p">:</span>
1099 <span class="gp">... </span>    <span class="nb">print</span><span class="p">(</span><span class="n">ent</span><span class="o">.</span><span class="n">text</span><span class="p">,</span> <span class="n">ent</span><span class="o">.</span><span class="n">start_char</span><span class="p">,</span> <span class="n">ent</span><span class="o">.</span><span class="n">end_char</span><span class="p">,</span>
1100 <span class="gp">... </span>          <span class="n">ent</span><span class="o">.</span><span class="n">label_</span><span class="p">,</span> <span class="n">spacy</span><span class="o">.</span><span class="n">explain</span><span class="p">(</span><span class="n">ent</span><span class="o">.</span><span class="n">label_</span><span class="p">))</span>
1101 <span class="gp">...</span>
1102 <span class="go">Great Piano Academy 0 19 ORG Companies, agencies, institutions, etc.</span>
1103 <span class="go">Mayfair 35 42 GPE Countries, cities, states</span>
1104 <span class="go">the City of London 46 64 GPE Countries, cities, states</span>
1105 </pre></div>
1106 
1107 <p>In the above example, <code>ent</code> is a <a href="https://spacy.io/api/span"><code>Span</code></a> object with various attributes:</p>
1108 <ul>
1109 <li><strong><code>text</code></strong> gives the Unicode text representation of the entity.</li>
1110 <li><strong><code>start_char</code></strong> denotes the character offset for the start of the entity.</li>
1111 <li><strong><code>end_char</code></strong> denotes the character offset for the end of the entity.</li>
1112 <li><strong><code>label_</code></strong> gives the label of the entity.</li>
1113 </ul>
1114 <p><code>spacy.explain</code> gives descriptive details about an entity label. The spaCy model has a pre-trained <a href="https://spaCy.io/api/annotation#named-entities">list of entity classes</a>. You can use displaCy to visualize these entities:</p>
1115 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">displacy</span><span class="o">.</span><span class="n">serve</span><span class="p">(</span><span class="n">piano_class_doc</span><span class="p">,</span> <span class="n">style</span><span class="o">=</span><span class="s1">&#39;ent&#39;</span><span class="p">)</span>
1116 </pre></div>
1117 
1118 <p>If you open <a href="http://127.0.0.1:5000">http://127.0.0.1:5000</a> in your browser, then you can see the visualization:</p>
1119 <figure class="figure mx-auto d-block"><a href="https://files.realpython.com/media/displacy_ner.1fba6869638f.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/displacy_ner.1fba6869638f.png" width="1930" height="140" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/displacy_ner.1fba6869638f.png&amp;w=482&amp;sig=18b93b0aed61930a6eedd37dbd12fbbce22733d4 482w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/displacy_ner.1fba6869638f.png&amp;w=965&amp;sig=f6b3cfb460053397a23a0eb49ebc22cf05dd15ab 965w, https://files.realpython.com/media/displacy_ner.1fba6869638f.png 1930w" sizes="75vw" alt="Displacy: Named Entity Recognition Demo"/></a><figcaption class="figure-caption text-center">displaCy: Named Entity Recognition Demo</figcaption></figure>
1120 
1121 <p>You can use NER to redact people&rsquo;s names from a text. For example, you might want to do this in order to hide personal information collected in a survey. You can use spaCy to do that:</p>
1122 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">survey_text</span> <span class="o">=</span> <span class="p">(</span><span class="s1">&#39;Out of 5 people surveyed, James Robert,&#39;</span>
1123 <span class="gp">... </span>               <span class="s1">&#39; Julie Fuller and Benjamin Brooks like&#39;</span>
1124 <span class="gp">... </span>               <span class="s1">&#39; apples. Kelly Cox and Matthew Evans&#39;</span>
1125 <span class="gp">... </span>               <span class="s1">&#39; like oranges.&#39;</span><span class="p">)</span>
1126 <span class="gp">...</span>
1127 <span class="gp">&gt;&gt;&gt; </span><span class="k">def</span> <span class="nf">replace_person_names</span><span class="p">(</span><span class="n">token</span><span class="p">):</span>
1128 <span class="gp">... </span>    <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">ent_iob</span> <span class="o">!=</span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">token</span><span class="o">.</span><span class="n">ent_type_</span> <span class="o">==</span> <span class="s1">&#39;PERSON&#39;</span><span class="p">:</span>
1129 <span class="gp">... </span>        <span class="k">return</span> <span class="s1">&#39;[REDACTED] &#39;</span>
1130 <span class="gp">... </span>    <span class="k">return</span> <span class="n">token</span><span class="o">.</span><span class="n">string</span>
1131 <span class="gp">...</span>
1132 <span class="gp">&gt;&gt;&gt; </span><span class="k">def</span> <span class="nf">redact_names</span><span class="p">(</span><span class="n">nlp_doc</span><span class="p">):</span>
1133 <span class="gp">... </span>    <span class="k">for</span> <span class="n">ent</span> <span class="ow">in</span> <span class="n">nlp_doc</span><span class="o">.</span><span class="n">ents</span><span class="p">:</span>
1134 <span class="gp">... </span>        <span class="n">ent</span><span class="o">.</span><span class="n">merge</span><span class="p">()</span>
1135 <span class="gp">... </span>    <span class="n">tokens</span> <span class="o">=</span> <span class="nb">map</span><span class="p">(</span><span class="n">replace_person_names</span><span class="p">,</span> <span class="n">nlp_doc</span><span class="p">)</span>
1136 <span class="gp">... </span>    <span class="k">return</span> <span class="s1">&#39;&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span>
1137 <span class="gp">...</span>
1138 <span class="gp">&gt;&gt;&gt; </span><span class="n">survey_doc</span> <span class="o">=</span> <span class="n">nlp</span><span class="p">(</span><span class="n">survey_text</span><span class="p">)</span>
1139 <span class="gp">&gt;&gt;&gt; </span><span class="n">redact_names</span><span class="p">(</span><span class="n">survey_doc</span><span class="p">)</span>
1140 <span class="go">&#39;Out of 5 people surveyed, [REDACTED] , [REDACTED] and&#39;</span>
1141 <span class="go">&#39; [REDACTED] like apples. [REDACTED] and [REDACTED]&#39;</span>
1142 <span class="go">&#39; like oranges.&#39;</span>
1143 </pre></div>
1144 
1145 <p>In this example, <code>replace_person_names()</code> uses <code>ent_iob</code>. It gives the IOB code of the named entity tag using <a href="https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)">inside-outside-beginning (IOB) tagging</a>. Here, it can assume a value other than zero, because zero means that no entity tag is set.</p>
1146 <h2 id="conclusion">Conclusion</h2>
1147 <p>spaCy is a powerful and advanced library that is gaining huge popularity for NLP applications due to its speed, ease of use, accuracy, and extensibility. Congratulations! You now know:</p>
1148 <ul>
1149 <li>What the foundational terms and concepts in NLP are</li>
1150 <li>How to implement those concepts in spaCy</li>
1151 <li>How to customize and extend built-in functionalities in spaCy</li>
1152 <li>How to perform basic statistical analysis on a text</li>
1153 <li>How to create a pipeline to process unstructured text</li>
1154 <li>How to parse a sentence and extract meaningful insights from it</li>
1155 </ul>
1156         <hr />
1157         <p><em>[ Improve Your Python With 🐍 Python Tricks πŸ’Œ – Get a short &amp; sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&amp;utm_medium=rss&amp;utm_campaign=footer">&gt;&gt; Click here to learn more and see examples</a> ]</em></p>#
1158 datePublished: #Mon Sep 2 14:00:00 2019#
1159 dateUpdated: #Mon Sep 2 14:00:00 2019#
1160 # Person begin ####################
1161 name: #Real Python#
1162 # Person end ######################
1163 # Item end ########################
1164 # Item begin ######################
1165 id: #https://realpython.com/pycharm-guide/#
1166 title: #PyCharm for Productive Python Development (Guide)#
1167 link: #https://realpython.com/pycharm-guide/#
1168 description: #In this step-by-step tutorial, you&apos;ll learn how you can use PyCharm to be a more productive Python developer. PyCharm makes debugging and visualization easy so you can focus on business logic and just get the job done.#
1169 content: #<p>As a programmer, you should be focused on the business logic and creating useful applications for your users. In doing that, <a href="https://www.jetbrains.com/pycharm/">PyCharm</a> by <a href="https://www.jetbrains.com/">JetBrains</a> saves you a lot of time by taking care of the routine and by making a number of other tasks such as debugging and visualization easy.   </p>
1170 <p><strong>In this article, you&rsquo;ll learn about:</strong></p>
1171 <ul>
1172 <li>Installing PyCharm</li>
1173 <li>Writing code in PyCharm</li>
1174 <li>Running your code in PyCharm</li>
1175 <li>Debugging and testing your code in PyCharm</li>
1176 <li>Editing an existing project in PyCharm</li>
1177 <li>Searching and navigating in PyCharm</li>
1178 <li>Using Version Control in PyCharm</li>
1179 <li>Using Plugins and External Tools in PyCharm</li>
1180 <li>Using PyCharm Professional features, such as Django support and Scientific mode</li>
1181 </ul>
1182 <p>This article assumes that you&rsquo;re familiar with Python development and already have some form of Python installed on your system. Python 3.6 will be used for this tutorial. Screenshots and demos provided are for macOS. Because PyCharm runs on all major platforms, you may see slightly different UI elements and may need to modify certain commands.</p>
1183 <div class="alert alert-primary" role="alert">
1184 <p><strong>Note</strong>: </p>
1185 <p>PyCharm comes in three editions: </p>
1186 <ol>
1187 <li><a href="https://www.jetbrains.com/pycharm-edu/">PyCharm Edu</a> is free and for educational purposes.  </li>
1188 <li><a href="https://www.jetbrains.com/pycharm">PyCharm Community</a> is free as well and intended for pure Python development. </li>
1189 <li><a href="https://www.jetbrains.com/pycharm">PyCharm Professional</a> is paid, has everything the Community edition has and also is very well suited for Web and Scientific development with support for such frameworks as Django and Flask, Database and SQL, and scientific tools such as Jupyter.</li>
1190 </ol>
1191 <p>For more details on their differences, check out the <a href="https://www.jetbrains.com/pycharm/features/editions_comparison_matrix.html">PyCharm Editions Comparison Matrix</a> by JetBrains. The company also has <a href="https://www.jetbrains.com/pycharm/buy/#edition=discounts">special offers</a> for students, teachers, open source projects, and other cases.</p>
1192 </div>
1193 <div class="alert alert-warning" role="alert"><p><strong>Clone Repo:</strong> <a href="https://realpython.com/optins/view/alcazar-web-framework/" class="alert-link" data-toggle="modal" data-target="#modal-alcazar-web-framework" data-focus="false">Click here to clone the repo you'll use</a> to explore the project-focused features of PyCharm in this tutorial.</p></div>
1194 
1195 <h2 id="installing-pycharm">Installing PyCharm</h2>
1196 <p>This article will use PyCharm Community Edition 2019.1 as it&rsquo;s free and available on every major platform. Only the section about the professional features will use PyCharm Professional Edition 2019.1.  </p>
1197 <p>The recommended way of installing PyCharm is with the <a href="https://www.jetbrains.com/toolbox/app/">JetBrains Toolbox App</a>. With its help, you&rsquo;ll be able to install different JetBrains products or several versions of the same product, update, rollback, and easily remove any tool when necessary. You&rsquo;ll also be able to quickly open any project in the right IDE and version.</p>
1198 <p>To install the Toolbox App, refer to the <a href="https://www.jetbrains.com/help/pycharm/installation-guide.html#toolbox">documentation</a> by JetBrains. It will automatically give you the right instructions depending on your OS. In case it didn&rsquo;t recognize your OS correctly, you can always find it from the drop down list on the top right section: </p>
1199 <p><a href="https://files.realpython.com/media/pycharm-jetbrains-os-list.231740335aaa.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-jetbrains-os-list.231740335aaa.png" width="1010" height="679" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-jetbrains-os-list.231740335aaa.png&amp;w=252&amp;sig=e331b2eb15a3c8b9396327dedc700bd2bcbbc9e3 252w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-jetbrains-os-list.231740335aaa.png&amp;w=505&amp;sig=4a0a0527b968050fb042a0565f5d6970d72ee1f9 505w, https://files.realpython.com/media/pycharm-jetbrains-os-list.231740335aaa.png 1010w" sizes="75vw" alt="List of OSes in the JetBrains website"/></a></p>
1200 <p>After installing, launch the app and accept the user agreement. Under the <em>Tools</em> tab, you&rsquo;ll see a list of available products. Find PyCharm Community there and click <em>Install</em>:</p>
1201 <p><a href="https://files.realpython.com/media/pycharm-toolbox-installed-pycharm.cdcf1b52bc02.png" target="_blank"><img class="img-fluid mx-auto d-block border w-33" src="https://files.realpython.com/media/pycharm-toolbox-installed-pycharm.cdcf1b52bc02.png" width="337" height="537" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-toolbox-installed-pycharm.cdcf1b52bc02.png&amp;w=84&amp;sig=5f1e571c6c7bed958efddaec87d6ac5168713217 84w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-toolbox-installed-pycharm.cdcf1b52bc02.png&amp;w=168&amp;sig=0a76111939657ae01eaa8203b24c0c2e4fff5ee6 168w, https://files.realpython.com/media/pycharm-toolbox-installed-pycharm.cdcf1b52bc02.png 337w" sizes="75vw" alt="PyCharm installed with the Toolbox app"/></a></p>
1202 <p>VoilΓ ! You have PyCharm available on your machine. If you don&rsquo;t want to use the Toolbox app, then you can also do a <a href="https://www.jetbrains.com/help/pycharm/installation-guide.html#standalone">stand-alone installation of PyCharm</a>.</p>
1203 <p>Launch PyCharm, and you&rsquo;ll see the import settings popup:</p>
1204 <p><a href="https://files.realpython.com/media/pycharm-import-settings-popup.4e360260c697.png" target="_blank"><img class="img-fluid mx-auto d-block w-50" src="https://files.realpython.com/media/pycharm-import-settings-popup.4e360260c697.png" width="416" height="156" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-import-settings-popup.4e360260c697.png&amp;w=104&amp;sig=4920753cc035f162c505253937453e1aa7cc4d26 104w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-import-settings-popup.4e360260c697.png&amp;w=208&amp;sig=b65b1226bcc172a811d7cd1e00cd600408fba092 208w, https://files.realpython.com/media/pycharm-import-settings-popup.4e360260c697.png 416w" sizes="75vw" alt="PyCharm Import Settings Popup"/></a></p>
1205 <p>PyCharm will automatically detect that this is a fresh install and choose <em>Do not import settings</em> for you. Click <em>OK</em>, and PyCharm will ask you to select a keymap scheme. Leave the default and click <em>Next: UI Themes</em> on the bottom right:</p>
1206 <p><a href="https://files.realpython.com/media/pycharm-keymap-scheme.c8115fda9bdd.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/pycharm-keymap-scheme.c8115fda9bdd.png" width="805" height="666" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-keymap-scheme.c8115fda9bdd.png&amp;w=201&amp;sig=644595a94c07780a552f76abbfc5fe526b3c9459 201w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-keymap-scheme.c8115fda9bdd.png&amp;w=402&amp;sig=64c348872c1519bc4148e3dbbac2550ed6c0fa30 402w, https://files.realpython.com/media/pycharm-keymap-scheme.c8115fda9bdd.png 805w" sizes="75vw" alt="PyCharm Keymap Scheme"/></a></p>
1207 <p>PyCharm will then ask you to choose a dark theme called Darcula or a light theme. Choose whichever you prefer and click <em>Next: Launcher Script</em>:  </p>
1208 <p><a href="https://files.realpython.com/media/pycharm-set-ui-theme.c48aac8e3fe0.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-set-ui-theme.c48aac8e3fe0.png" width="803" height="666" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-set-ui-theme.c48aac8e3fe0.png&amp;w=200&amp;sig=6998b85afd9e2ca1503624ba55b904f4051f1ffe 200w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-set-ui-theme.c48aac8e3fe0.png&amp;w=401&amp;sig=a6480f0b0073f0680068828fc2f0f5c8ee55cbdb 401w, https://files.realpython.com/media/pycharm-set-ui-theme.c48aac8e3fe0.png 803w" sizes="75vw" alt="PyCharm Set UI Theme Page"/></a></p>
1209 <p>I&rsquo;ll be using the dark theme Darcula throughout this tutorial. You can find and install other themes as <a href="#using-plugins-and-external-tools-in-pycharm">plugins</a>, or you can also <a href="https://blog.codota.com/5-best-intellij-themes/">import them</a>.</p>
1210 <p>On the next page, leave the defaults and click <em>Next: Featured plugins</em>. There, PyCharm will show you a list of plugins you may want to install because most users like to use them. Click <em>Start using PyCharm</em>, and now you are ready to write some code!</p>
1211 <h2 id="writing-code-in-pycharm">Writing Code in PyCharm</h2>
1212 <p>In PyCharm, you do everything in the context of a <strong>project</strong>. Thus, the first thing you need to do is create one.</p>
1213 <p>After installing and opening PyCharm, you are on the welcome screen. Click <em>Create New Project</em>, and you&rsquo;ll see the <em>New Project</em> popup:</p>
1214 <p><a href="https://files.realpython.com/media/pycharm-new-project.cc35f3aa1056.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-new-project.cc35f3aa1056.png" width="664" height="480" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-new-project.cc35f3aa1056.png&amp;w=166&amp;sig=6423b68127eae8ca93165323df4884844265f5e3 166w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-new-project.cc35f3aa1056.png&amp;w=332&amp;sig=50d660be9a42904b1161bd76df1ab9ddd77e2132 332w, https://files.realpython.com/media/pycharm-new-project.cc35f3aa1056.png 664w" sizes="75vw" alt="New Project in PyCharm"/></a></p>
1215 <p>Specify the project location and expand the <em>Project Interpreter</em> drop down. Here, you have options to create a new project interpreter or reuse an existing one. Choose <em>New environment using</em>. Right next to it, you have a drop down list to select one of <em>Virtualenv</em>, <em>Pipenv</em>, or <em>Conda</em>, which are the tools that help to keep dependencies required by different projects separate by creating isolated Python environments for them. </p>
1216 <p>You are free to select whichever you like, but <em>Virtualenv</em> is used for this tutorial. If you choose to, you can specify the environment location and choose the base interpreter from the list, which is a list of Python interpreters (such as Python2.7 and Python3.6) installed on your system. Usually, the defaults are fine. Then you have to select boxes to inherit global site-packages to your new environment and make it available to all other projects. Leave them unselected.  </p>
1217 <p>Click <em>Create</em> on the bottom right and you will see the new project created:</p>
1218 <p><a href="https://files.realpython.com/media/pycharm-project-created.99dffd1d4e9a.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-project-created.99dffd1d4e9a.png" width="1174" height="734" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-project-created.99dffd1d4e9a.png&amp;w=293&amp;sig=d6394f174acab8ee63eb6ce0360d0174857f7afb 293w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-project-created.99dffd1d4e9a.png&amp;w=587&amp;sig=2fd8ea44cc0ad15f015aab687018ec7ad8861a53 587w, https://files.realpython.com/media/pycharm-project-created.99dffd1d4e9a.png 1174w" sizes="75vw" alt="Project created in PyCharm"/></a></p>
1219 <p>You will also see a small <em>Tip of the Day</em> popup where PyCharm gives you one trick to learn at each startup. Go ahead and close this popup.</p>
1220 <p>It is now time to start a new Python program. Type <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-n">N</kbd></span> if you are on Mac or <span class="keys"><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-insert">Ins</kbd></span> if you are on Windows or Linux. Then, choose <em>Python File</em>. You can also select <em>File β†’ New</em> from the menu. Name the new file <code>guess_game.py</code> and click <em>OK</em>. You will see a PyCharm window similar to the following:</p>
1221 <p><a href="https://files.realpython.com/media/pycharm-new-file.7ea9902d73ea.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-new-file.7ea9902d73ea.png" width="1172" height="734" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-new-file.7ea9902d73ea.png&amp;w=293&amp;sig=b1ee432e97d642aea67818cc7280971247196a62 293w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-new-file.7ea9902d73ea.png&amp;w=586&amp;sig=3563b00140a3edd3f1df0e522d5069f6efcb62d3 586w, https://files.realpython.com/media/pycharm-new-file.7ea9902d73ea.png 1172w" sizes="75vw" alt="PyCharm New File"/></a></p>
1222 <p>For our test code, let&rsquo;s quickly code up a simple guessing game in which the program chooses a number that the user has to guess. For every guess, the program will tell if the user&rsquo;s guess was smaller or bigger than the secret number. The game ends when the user guesses the number. Here&rsquo;s the code for the game:</p>
1223 <div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="kn">from</span> <span class="nn">random</span> <span class="k">import</span> <span class="n">randint</span>
1224 <span class="lineno"> 2 </span>
1225 <span class="lineno"> 3 </span><span class="k">def</span> <span class="nf">play</span><span class="p">():</span>
1226 <span class="lineno"> 4 </span>    <span class="n">random_int</span> <span class="o">=</span> <span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
1227 <span class="lineno"> 5 </span>
1228 <span class="lineno"> 6 </span>    <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
1229 <span class="lineno"> 7 </span>        <span class="n">user_guess</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="nb">input</span><span class="p">(</span><span class="s2">&quot;What number did we guess (0-100)?&quot;</span><span class="p">))</span>
1230 <span class="lineno"> 8 </span>
1231 <span class="lineno"> 9 </span>        <span class="k">if</span> <span class="n">user_guess</span> <span class="o">==</span> <span class="n">randint</span><span class="p">:</span>
1232 <span class="lineno">10 </span>            <span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="s2">&quot;You found the number (</span><span class="si">{random_int}</span><span class="s2">). Congrats!&quot;</span><span class="p">)</span>
1233 <span class="lineno">11 </span>            <span class="k">break</span>
1234 <span class="lineno">12 </span>
1235 <span class="lineno">13 </span>        <span class="k">if</span> <span class="n">user_guess</span> <span class="o">&lt;</span> <span class="n">random_int</span><span class="p">:</span>
1236 <span class="lineno">14 </span>            <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Your number is less than the number we guessed.&quot;</span><span class="p">)</span>
1237 <span class="lineno">15 </span>            <span class="k">continue</span>
1238 <span class="lineno">16 </span>
1239 <span class="lineno">17 </span>        <span class="k">if</span> <span class="n">user_guess</span> <span class="o">&gt;</span> <span class="n">random_int</span><span class="p">:</span>
1240 <span class="lineno">18 </span>            <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Your number is more than the number we guessed.&quot;</span><span class="p">)</span>
1241 <span class="lineno">19 </span>            <span class="k">continue</span>
1242 <span class="lineno">20 </span>
1243 <span class="lineno">21 </span>
1244 <span class="lineno">22 </span><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span>
1245 <span class="lineno">23 </span>    <span class="n">play</span><span class="p">()</span>
1246 </pre></div>
1247 
1248 <p>Type this code directly rather than copying and pasting. You&rsquo;ll see something like this:</p>
1249 <p><a href="https://files.realpython.com/media/typing-guess-game.fcaedeb8ece2.gif" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/typing-guess-game.fcaedeb8ece2.gif" width="528" height="480" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/typing-guess-game.fcaedeb8ece2.gif&amp;w=132&amp;sig=7e5eb20fb9ae97b1cea80380f9ad00f35dd76707 132w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/typing-guess-game.fcaedeb8ece2.gif&amp;w=264&amp;sig=eb242bca301203a38741a986d7847cf0f3ef4cff 264w, https://files.realpython.com/media/typing-guess-game.fcaedeb8ece2.gif 528w" sizes="75vw" alt="Typing Guessing Game"/></a></p>
1250 <p>As you can see, PyCharm provides <a href="https://www.jetbrains.com/pycharm/features/coding_assistance.html">Intelligent Coding Assistance</a> with code completion, code inspections, on-the-fly error highlighting, and quick-fix suggestions. In particular, note how when you typed <code>main</code> and then hit tab, PyCharm auto-completed the whole <code>main</code> clause for you. </p>
1251 <p>Also note how, if you forget to type <code>if</code> before the condition, append <code>.if</code>, and then hit <span class="keys"><kbd class="key-tab">Tab</kbd></span>, PyCharm fixes the <code>if</code> clause for you. The same is true with <code>True.while</code>. That&rsquo;s <a href="https://www.jetbrains.com/help/pycharm/settings-postfix-completion.html">PyCharm&rsquo;s Postfix completions</a> working for you to help reduce backward caret jumps.</p>
1252 <h2 id="running-code-in-pycharm">Running Code in PyCharm</h2>
1253 <p>Now that you&rsquo;ve coded up the game, it&rsquo;s time for you to run it.</p>
1254 <p>You have three ways of running this program:</p>
1255 <ol>
1256 <li>Use the shortcut <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-r">R</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-f10">F10</kbd></span> on Windows or Linux.</li>
1257 <li>Right-click the background and choose <em>Run &lsquo;guess_game&rsquo;</em> from the menu.</li>
1258 <li>Since this program has the <code>__main__</code> clause, you can click on the little green arrow to the left of the <code>__main__</code> clause and choose <em>Run &lsquo;guess_game&rsquo;</em> from there.</li>
1259 </ol>
1260 <p>Use any one of the options above to run the program, and you&rsquo;ll see the Run Tool pane appear at the bottom of the window, with your code output showing:</p>
1261 <p><a href="https://files.realpython.com/media/pycharm-running-script.33fb830f45b4.gif" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-running-script.33fb830f45b4.gif" width="1068" height="720" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-running-script.33fb830f45b4.gif&amp;w=267&amp;sig=44be962297881f8ae66557c19905a55202ee14de 267w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-running-script.33fb830f45b4.gif&amp;w=534&amp;sig=000552999cea7a9ce79eeec2f9db7361e9585d77 534w, https://files.realpython.com/media/pycharm-running-script.33fb830f45b4.gif 1068w" sizes="75vw" alt="Running a script in PyCharm"/></a></p>
1262 <p>Play the game for a little bit to see if you can find the number guessed. Pro tip: start with 50.   </p>
1263 <h2 id="debugging-in-pycharm">Debugging in PyCharm</h2>
1264 <p>Did you find the number? If so, you may have seen something weird after you found the number. Instead of printing the congratulations message and exiting, the program seems to start over. That&rsquo;s a bug right there. To discover why the program starts over, you&rsquo;ll now debug the program.</p>
1265 <p>First, place a breakpoint by clicking on the blank space to the left of line number 8:</p>
1266 <p><a href="https://files.realpython.com/media/pycharm-debug-breakpoint.55cf93c49859.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-debug-breakpoint.55cf93c49859.png" width="1042" height="710" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-debug-breakpoint.55cf93c49859.png&amp;w=260&amp;sig=e714eeae34fad6c0e5889bee0f236f9c30e100a0 260w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-debug-breakpoint.55cf93c49859.png&amp;w=521&amp;sig=7c4692a273d49a627785a715fd0a08d4c8120649 521w, https://files.realpython.com/media/pycharm-debug-breakpoint.55cf93c49859.png 1042w" sizes="75vw" alt="Debug breakpoint in PyCharm"/></a></p>
1267 <p>This will be the point where the program will be suspended, and you can start exploring what went wrong from there on. Next, choose one of the following three ways to start debugging:</p>
1268 <ol>
1269 <li>Press <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-d">D</kbd></span> on Mac or <span class="keys"><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-f9">F9</kbd></span> on Windows or Linux.</li>
1270 <li>Right-click the background and choose <em>Debug &lsquo;guess_game&rsquo;</em>.</li>
1271 <li>Click on the little green arrow to the left of the <code>__main__</code> clause and choose <em>Debug &lsquo;guess_game</em> from there.</li>
1272 </ol>
1273 <p>Afterwards, you&rsquo;ll see a <em>Debug</em> window open at the bottom:</p>
1274 <p><a href="https://files.realpython.com/media/pycharm-debugging-start.04246b743469.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-debugging-start.04246b743469.png" width="1043" height="711" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-debugging-start.04246b743469.png&amp;w=260&amp;sig=cea78f8df9a7f183330e3610c90a2abeab879923 260w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-debugging-start.04246b743469.png&amp;w=521&amp;sig=6ae0fe4515cf7f0d52cbcda986eb9c52d0a602ee 521w, https://files.realpython.com/media/pycharm-debugging-start.04246b743469.png 1043w" sizes="75vw" alt="Start of debugging in PyCharm"/></a></p>
1275 <p>Follow the steps below to debug the program:</p>
1276 <ol>
1277 <li>
1278 <p>Notice that the current line is highlighted in blue.</p>
1279 </li>
1280 <li>
1281 <p>See that <code>random_int</code> and its value are listed in the Debug window. Make a note of this number. (In the picture, the number is 85.)</p>
1282 </li>
1283 <li>
1284 <p>Hit <span class="keys"><kbd class="key-f8">F8</kbd></span> to execute the current line and step <em>over</em> to the next one. You can also use <span class="keys"><kbd class="key-f7">F7</kbd></span> to step <em>into</em> the function in the current line, if necessary. As you continue executing the statements, the changes in the variables will be automatically reflected in the Debugger window.</p>
1285 </li>
1286 <li>
1287 <p>Notice that there is the Console tab right next to the Debugger tab that opened. This Console tab and the Debugger tab are mutually exclusive. In the Console tab, you will be interacting with your program, and in the Debugger tab you will do the debugging actions.</p>
1288 </li>
1289 <li>
1290 <p>Switch to the Console tab to enter your guess.</p>
1291 </li>
1292 <li>
1293 <p>Type the number shown, and then hit <span class="keys"><kbd class="key-enter">Enter</kbd></span>.</p>
1294 </li>
1295 <li>
1296 <p>Switch back to the Debugger tab.</p>
1297 </li>
1298 <li>
1299 <p>Hit <span class="keys"><kbd class="key-f8">F8</kbd></span> again to evaluate the <code>if</code> statement. Notice that you are now on line 14. But wait a minute! Why didn&rsquo;t it go to the line 11? The reason is that the <code>if</code> statement on line 10 evaluated to <code>False</code>. But why did it evaluate to <code>False</code> when you entered the number that was chosen?</p>
1300 </li>
1301 <li>
1302 <p>Look carefully at line 10 and notice that we are comparing <code>user_guess</code> with the wrong thing. Instead of comparing it with <code>random_int</code>, we are comparing it with <code>randint</code>, the function that was imported from the <code>random</code> package.</p>
1303 </li>
1304 <li>
1305 <p>Change it to <code>random_int</code>, restart the debugging, and follow the same steps again. You will see that, this time, it will go to line 11, and line 10 will evaluate to <code>True</code>:</p>
1306 </li>
1307 </ol>
1308 <p><a href="https://files.realpython.com/media/pycharm-debugging-scripts.bb5a077da438.gif" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-debugging-scripts.bb5a077da438.gif" width="1092" height="720" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-debugging-scripts.bb5a077da438.gif&amp;w=273&amp;sig=fc5de269fbc13ea5c1d8be4ca7f525e04a4bb68c 273w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-debugging-scripts.bb5a077da438.gif&amp;w=546&amp;sig=4719f69fd6e78ed3e68f16a9355d00d0edd85a26 546w, https://files.realpython.com/media/pycharm-debugging-scripts.bb5a077da438.gif 1092w" sizes="75vw" alt="Debugging Script in PyCharm"/></a></p>
1309 <p>Congratulations! You fixed the bug.</p>
1310 <h2 id="testing-in-pycharm">Testing in PyCharm</h2>
1311 <p>No application is reliable without unit tests. PyCharm helps you write and run them very quickly and comfortably. By default, <a href="https://docs.python.org/3/library/unittest.html"><code>unittest</code></a> is used as the test runner, but PyCharm also supports other testing frameworks such as <a href="http://www.pytest.org/en/latest/"><code>pytest</code></a>, <a href="https://nose.readthedocs.io/en/latest/"><code>nose</code></a>, <a href="https://docs.python.org/3/library/doctest.html"><code>doctest</code></a>, <a href="https://www.jetbrains.com/help/pycharm/tox-support.html"><code>tox</code></a>, and <a href="https://twistedmatrix.com/trac/wiki/TwistedTrial"><code>trial</code></a>. You can, for example, enable <code>pytest</code> for your project like this:</p>
1312 <ol>
1313 <li>Open the <em>Settings/Preferences β†’ Tools β†’ Python Integrated Tools</em> settings dialog.</li>
1314 <li>Select <code>pytest</code> in the Default test runner field.</li>
1315 <li>Click <em>OK</em> to save the settings. </li>
1316 </ol>
1317 <p>For this example, we&rsquo;ll be using the default test runner <code>unittest</code>. </p>
1318 <p>In the same project, create a file called <code>calculator.py</code> and put the following <code>Calculator</code> class in it:</p>
1319 <div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="k">class</span> <span class="nc">Calculator</span><span class="p">:</span>
1320 <span class="lineno"> 2 </span>    <span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
1321 <span class="lineno"> 3 </span>        <span class="k">return</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
1322 <span class="lineno"> 4 </span>
1323 <span class="lineno"> 5 </span>    <span class="k">def</span> <span class="nf">multiply</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
1324 <span class="lineno"> 6 </span>        <span class="k">return</span> <span class="n">a</span> <span class="o">*</span> <span class="n">b</span>
1325 </pre></div>
1326 
1327 <p>PyCharm makes it very easy to create tests for your existing code. With the <code>calculator.py</code> file open, execute any one of the following that you like:</p>
1328 <ul>
1329 <li>Press <span class="keys"><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-t">T</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-t">T</kbd></span> on Windows or Linux.</li>
1330 <li>Right-click in the background of the class and then choose <em>Go To</em> and <em>Test</em>.</li>
1331 <li>On the main menu, choose <em>Navigate β†’ Test</em>.</li>
1332 </ul>
1333 <p>Choose <em>Create New Test&hellip;</em>, and you will see the following window:</p>
1334 <p><a href="https://files.realpython.com/media/pycharm-create-tests.9a6cea78f9c6.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-create-tests.9a6cea78f9c6.png" width="500" height="402" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-create-tests.9a6cea78f9c6.png&amp;w=125&amp;sig=0c50b83f35578fd8004dce9e7d55fcd3b09a1967 125w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-create-tests.9a6cea78f9c6.png&amp;w=250&amp;sig=f6740142e71d2d2f6196ffbe341ae09bc9a64453 250w, https://files.realpython.com/media/pycharm-create-tests.9a6cea78f9c6.png 500w" sizes="75vw" alt="Create tests in PyCharm"/></a></p>
1335 <p>Leave the defaults of <em>Target directory</em>, <em>Test file name</em>, and <em>Test class name</em>. Select both of the methods and click <em>OK</em>. Voila! PyCharm automatically created a file called <code>test_calculator.py</code> and created the following stub tests for you in it:</p>
1336 <div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="kn">from</span> <span class="nn">unittest</span> <span class="k">import</span> <span class="n">TestCase</span>
1337 <span class="lineno"> 2 </span>
1338 <span class="lineno"> 3 </span><span class="k">class</span> <span class="nc">TestCalculator</span><span class="p">(</span><span class="n">TestCase</span><span class="p">):</span>
1339 <span class="lineno"> 4 </span>    <span class="k">def</span> <span class="nf">test_add</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
1340 <span class="lineno"> 5 </span>        <span class="bp">self</span><span class="o">.</span><span class="n">fail</span><span class="p">()</span>
1341 <span class="lineno"> 6 </span>
1342 <span class="lineno"> 7 </span>    <span class="k">def</span> <span class="nf">test_multiply</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
1343 <span class="lineno"> 8 </span>        <span class="bp">self</span><span class="o">.</span><span class="n">fail</span><span class="p">()</span>
1344 </pre></div>
1345 
1346 <p>Run the tests using one of the methods below:</p>
1347 <ul>
1348 <li>Press <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-r">R</kbd></span> on Mac or <span class="keys"><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-f10">F10</kbd></span> on Windows or Linux.</li>
1349 <li>Right-click the background and choose <em>Run &lsquo;Unittests for test_calculator.py&rsquo;</em>.</li>
1350 <li>Click on the little green arrow to the left of the test class name and choose <em>Run &lsquo;Unittests for test_calculator.py&rsquo;</em>.</li>
1351 </ul>
1352 <p>You&rsquo;ll see the tests window open on the bottom with all the tests failing:</p>
1353 <p><a href="https://files.realpython.com/media/pycharm-failed-tests.810aa9c365cb.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-failed-tests.810aa9c365cb.png" width="972" height="645" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-failed-tests.810aa9c365cb.png&amp;w=243&amp;sig=cb7ef285c20ed83b9771a91cc38d77342e4d3745 243w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-failed-tests.810aa9c365cb.png&amp;w=486&amp;sig=aaec55471da2d168048783b9c4e573f5ce876de4 486w, https://files.realpython.com/media/pycharm-failed-tests.810aa9c365cb.png 972w" sizes="75vw" alt="Failed tests in PyCharm"/></a></p>
1354 <p>Notice that you have the hierarchy of the test results on the left and the output of the terminal on the right. </p>
1355 <p>Now, implement <code>test_add</code> by changing the code to the following:</p>
1356 <div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="kn">from</span> <span class="nn">unittest</span> <span class="k">import</span> <span class="n">TestCase</span>
1357 <span class="lineno"> 2 </span>
1358 <span class="lineno"> 3 </span><span class="kn">from</span> <span class="nn">calculator</span> <span class="k">import</span> <span class="n">Calculator</span>
1359 <span class="lineno"> 4 </span>
1360 <span class="lineno"> 5 </span><span class="k">class</span> <span class="nc">TestCalculator</span><span class="p">(</span><span class="n">TestCase</span><span class="p">):</span>
1361 <span class="lineno"> 6 </span>    <span class="k">def</span> <span class="nf">test_add</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
1362 <span class="lineno"> 7 </span>        <span class="bp">self</span><span class="o">.</span><span class="n">calculator</span> <span class="o">=</span> <span class="n">Calculator</span><span class="p">()</span>
1363 <span class="lineno"> 8 </span>        <span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">calculator</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">),</span> <span class="mi">7</span><span class="p">)</span>
1364 <span class="lineno"> 9 </span>
1365 <span class="lineno">10 </span>    <span class="k">def</span> <span class="nf">test_multiply</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
1366 <span class="lineno">11 </span>        <span class="bp">self</span><span class="o">.</span><span class="n">fail</span><span class="p">()</span>
1367 </pre></div>
1368 
1369 <p>Run the tests again, and you&rsquo;ll see that one test passed and the other failed. Explore the options to show passed tests, to show ignored tests, to sort tests alphabetically, and to sort tests by duration:</p>
1370 <p><a href="https://files.realpython.com/media/pycharm-running-tests.6077562207ba.gif" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-running-tests.6077562207ba.gif" width="1092" height="720" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-running-tests.6077562207ba.gif&amp;w=273&amp;sig=e2238425e1cfb9a9a298741244f3021f3984dbf8 273w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-running-tests.6077562207ba.gif&amp;w=546&amp;sig=85203afd5ff2189fdfadad0960da514b03063953 546w, https://files.realpython.com/media/pycharm-running-tests.6077562207ba.gif 1092w" sizes="75vw" alt="Running tests in PyCharm"/></a></p>
1371 <p>Note that the <code>sleep(0.1)</code> method that you see in the GIF above is intentionally used to make one of the tests slower so that sorting by duration works. </p>
1372 <h2 id="editing-an-existing-project-in-pycharm">Editing an Existing Project in PyCharm</h2>
1373 <p>These single file projects are great for examples, but you&rsquo;ll often work on much larger projects over a longer period of time. In this section, you&rsquo;ll take a look at how PyCharm works with a larger project. </p>
1374 <p>To explore the project-focused features of PyCharm, you&rsquo;ll use the Alcazar web framework that was built for learning purposes. To continue following along, clone the repo locally:</p>
1375 <div class="alert alert-warning" role="alert"><p><strong>Clone Repo:</strong> <a href="https://realpython.com/optins/view/alcazar-web-framework/" class="alert-link" data-toggle="modal" data-target="#modal-alcazar-web-framework" data-focus="false">Click here to clone the repo you'll use</a> to explore the project-focused features of PyCharm in this tutorial.</p></div>
1376 
1377 <p>Once you have a project locally, open it in PyCharm using one of the following methods:</p>
1378 <ul>
1379 <li>Click <em>File β†’ Open</em> on the main menu.</li>
1380 <li>Click <em>Open</em> on the <a href="https://www.jetbrains.com/help/pycharm/welcome-screen.html">Welcome Screen</a> if you are there.</li>
1381 </ul>
1382 <p>After either of these steps, find the folder containing the project on your computer and open it.</p>
1383 <p>If this project contains a <a href="https://realpython.com/python-virtual-environments-a-primer/">virtual environment</a>, then PyCharm will automatically use this virtual environment and make it the project interpreter.</p>
1384 <p>If you need to configure a different <code>virtualenv</code>, then open <em>Preferences</em> on Mac by pressing <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-comma">,</kbd></span> or <em>Settings</em> on Windows or Linux by pressing <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-s">S</kbd></span> and find the <em>Project: ProjectName</em> section. Open the drop-down and choose <em>Project Interpreter</em>:</p>
1385 <p><a href="https://files.realpython.com/media/pycharm-project-interpreter.57282306555a.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-project-interpreter.57282306555a.png" width="1083" height="723" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-project-interpreter.57282306555a.png&amp;w=270&amp;sig=286643bc473f648bbcce27338c980eb023746ac2 270w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-project-interpreter.57282306555a.png&amp;w=541&amp;sig=6d33c5adf0380e8c5b01ae9430436983580b4b49 541w, https://files.realpython.com/media/pycharm-project-interpreter.57282306555a.png 1083w" sizes="75vw" alt="Project interpreter in PyCharm"/></a></p>
1386 <p>Choose the <code>virtualenv</code> from the drop-down list. If it&rsquo;s not there, then click on the settings button to the right of the drop-down list and then choose <em>Add&hellip;</em>. The rest of the steps should be the same as when we were <a href="#writing-code-in-pycharm">creating a new project</a>.</p>
1387 <h2 id="searching-and-navigating-in-pycharm">Searching and Navigating in PyCharm</h2>
1388 <p>In a big project where it&rsquo;s difficult for a single person to remember where everything is located, it&rsquo;s very important to be able to quickly navigate and find what you looking for. PyCharm has you covered here as well. Use the project you opened in the section above to practice these shortcuts: </p>
1389 <ul>
1390 <li><strong>Searching for a fragment in the current file:</strong> Press <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-f">F</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-f">F</kbd></span> on Windows or Linux.</li>
1391 <li><strong>Searching for a fragment in the entire project:</strong> Press <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-f">F</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-f">F</kbd></span> on Windows or Linux.</li>
1392 <li><strong>Searching for a class:</strong> Press <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-o">O</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-n">N</kbd></span> on Windows or Linux.</li>
1393 <li><strong>Searching for a file:</strong> Press <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-o">O</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-n">N</kbd></span> on Windows or Linux.</li>
1394 <li><strong>Searching all if you don&rsquo;t know whether it&rsquo;s a file, class, or a code fragment that you are looking for:</strong> Press <span class="keys"><kbd class="key-shift">Shift</kbd></span> twice.</li>
1395 </ul>
1396 <p>As for the navigation, the following shortcuts may save you a lot of time:</p>
1397 <ul>
1398 <li><strong>Going to the declaration of a variable:</strong> Press <span class="keys"><kbd class="key-command">Cmd</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd></span> on Windows or Linux, and click on the variable.</li>
1399 <li><strong>Finding usages of a class, a method, or any symbol:</strong> Press <span class="keys"><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-f7">F7</kbd></span>.</li>
1400 <li><strong>Seeing your recent changes:</strong> Press <span class="keys"><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-c">C</kbd></span> or go to <em>View β†’ Recent Changes</em> on the main menu.</li>
1401 <li><strong>Seeing your recent files:</strong> Press <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-e">E</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-e">E</kbd></span> on Windows or Linux, or go to <em>View β†’ Recent Files</em> on the main menu.</li>
1402 <li><strong>Going backward and forward through your history of navigation after you jumped around:</strong> Press <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-bracket-left">[</kbd></span> / <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-bracket-right">]</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-arrow-left">Left</kbd></span> / <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-arrow-right">Right</kbd></span> on Windows or Linux.</li>
1403 </ul>
1404 <p>For more details, see the <a href="https://www.jetbrains.com/help/pycharm/tutorial-exploring-navigation-and-search.html">official documentation</a>. </p>
1405 <h2 id="using-version-control-in-pycharm">Using Version Control in PyCharm</h2>
1406 <p>Version control systems such as <a href="https://git-scm.com/">Git</a> and <a href="https://www.mercurial-scm.org/">Mercurial</a> are some of the most important tools in the modern software development world. So, it is essential for an IDE to support them. PyCharm does that very well by integrating with a lot of popular VC systems such as Git (and <a href="https://github.com/">Github</a>), Mercurial, <a href="https://www.perforce.com/solutions/version-control">Perforce</a> and, <a href="https://subversion.apache.org/">Subversion</a>.</p>
1407 <div class="alert alert-primary" role="alert">
1408 <p><strong>Note</strong>: <a href="https://realpython.com/python-git-github-intro/">Git</a> is used for the following examples.</p>
1409 </div>
1410 <h3 id="configuring-vcs">Configuring VCS</h3>
1411 <p>To enable VCS integration. Go to <em>VCS β†’ VCS Operations Popup&hellip;</em> from the menu on the top or press <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-v">V</kbd></span> on Mac or <span class="keys"><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-grave">`</kbd></span> on Windows or Linux. Choose <em>Enable Version Control Integration&hellip;</em>. You&rsquo;ll see the following window open:</p>
1412 <p><a href="https://files.realpython.com/media/pycharm-enable-vc-integration.b30ec94c1246.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-enable-vc-integration.b30ec94c1246.png" width="715" height="147" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-enable-vc-integration.b30ec94c1246.png&amp;w=178&amp;sig=7ef55e4ed6068c86d831adaefc1af11f4c083763 178w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-enable-vc-integration.b30ec94c1246.png&amp;w=357&amp;sig=c701d809b49e9837477c14e8dfe34dc4cfa66c33 357w, https://files.realpython.com/media/pycharm-enable-vc-integration.b30ec94c1246.png 715w" sizes="75vw" alt="Enable Version Control Integration in PyCharm"/></a></p>
1413 <p>Choose <em>Git</em> from the drop down list, click <em>OK</em>, and you have VCS enabled for your project. Note that if you opened an existing project that has version control enabled, then PyCharm will see that and automatically enable it.</p>
1414 <p>Now, if you go to the <em>VCS Operations Popup&hellip;</em>, you&rsquo;ll see a different popup with the options to do <code>git add</code>, <code>git stash</code>, <code>git branch</code>, <code>git commit</code>, <code>git push</code> and more:</p>
1415 <p><a href="https://files.realpython.com/media/pycharm-vcs-operations.70dbafcb983a.png" target="_blank"><img class="img-fluid mx-auto d-block border w-50" src="https://files.realpython.com/media/pycharm-vcs-operations.70dbafcb983a.png" width="392" height="379" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-vcs-operations.70dbafcb983a.png&amp;w=98&amp;sig=f285b015c957936448441c4ec8b03cf8627cdffc 98w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-vcs-operations.70dbafcb983a.png&amp;w=196&amp;sig=a16c449a21dde2a2b0f2f7533c92e71574cd3d60 196w, https://files.realpython.com/media/pycharm-vcs-operations.70dbafcb983a.png 392w" sizes="75vw" alt="VCS operations in PyCharm"/></a></p>
1416 <p>If you can&rsquo;t find what you need, you can most probably find it by going to <em>VCS</em> from the top menu and choosing <em>Git</em>, where you can even create and view pull requests.</p>
1417 <h3 id="committing-and-conflict-resolution">Committing and Conflict Resolution</h3>
1418 <p>These are two features of VCS integration in PyCharm that I personally use and enjoy a lot! Let&rsquo;s say you have finished your work and want to commit it. Go to <em>VCS β†’ VCS Operations Popup&hellip; β†’ Commit&hellip;</em> or press <span class="keys"><kbd class="key-command">Cmd</kbd><span>+</span><kbd class="key-k">K</kbd></span> on Mac or <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-k">K</kbd></span> on Windows or Linux. You&rsquo;ll see the following window open:</p>
1419 <p><a href="https://files.realpython.com/media/pycharm-commit-window.a4ceff16c2d3.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-commit-window.a4ceff16c2d3.png" width="929" height="682" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-commit-window.a4ceff16c2d3.png&amp;w=232&amp;sig=935dabf7a28cf757a5c87165e3da494540c3e4a6 232w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-commit-window.a4ceff16c2d3.png&amp;w=464&amp;sig=7a05fe8a5cbfb1c3524c24515f9a8d7fa3211591 464w, https://files.realpython.com/media/pycharm-commit-window.a4ceff16c2d3.png 929w" sizes="75vw" alt="Commit window in PyCharm"/></a></p>
1420 <p>In this window, you can do the following:</p>
1421 <ol>
1422 <li>Choose which files to commit</li>
1423 <li>Write your commit message</li>
1424 <li>Do all kinds of checks and cleanup <a href="https://www.jetbrains.com/help/idea/commit-changes-dialog.html#before_commit">before commit</a></li>
1425 <li>See the difference of changes</li>
1426 <li>Commit and push at once by pressing the arrow to the right of the <em>Commit</em> button on the right bottom and choosing <em>Commit and Push&hellip;</em></li>
1427 </ol>
1428 <p>It can feel magical and fast, especially if you&rsquo;re used to doing everything manually on the command line.</p>
1429 <p>When you work in a team, <strong>merge conflicts</strong> do happen. When somebody commits changes to a file that you&rsquo;re working on, but their changes overlap with yours because both of you changed the same lines, then VCS will not be able to figure out if it should choose your changes or those of your teammate. So you&rsquo;ll get these unfortunate arrows and symbols:</p>
1430 <p><a href="https://files.realpython.com/media/pycharm-conflicts.74b23b9ec798.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-conflicts.74b23b9ec798.png" width="996" height="691" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-conflicts.74b23b9ec798.png&amp;w=249&amp;sig=d02221be0ce12dbc9a8ea7514e047bb608b16c08 249w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-conflicts.74b23b9ec798.png&amp;w=498&amp;sig=1286c7aa0f62795e01c6a2f0607096b8c0d67b20 498w, https://files.realpython.com/media/pycharm-conflicts.74b23b9ec798.png 996w" sizes="75vw" alt="Conflicts in PyCharm"/></a></p>
1431 <p>This looks strange, and it&rsquo;s difficult to figure out which changes should be deleted and which ones should stay. PyCharm to the rescue! It has a much nicer and cleaner way of resolving conflicts. Go to <em>VCS</em> in the top menu, choose <em>Git</em> and then <em>Resolve conflicts&hellip;</em>. Choose the file whose conflicts you want to resolve and click on <em>Merge</em>. You will see the following window open:</p>
1432 <p><a href="https://files.realpython.com/media/pycharm-conflict-resolving-window.eea8f79a12b2.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-conflict-resolving-window.eea8f79a12b2.png" width="1174" height="709" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-conflict-resolving-window.eea8f79a12b2.png&amp;w=293&amp;sig=ef195e8acbb5ec9fa55fca46a43486995c2efca7 293w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-conflict-resolving-window.eea8f79a12b2.png&amp;w=587&amp;sig=067731ba579736c92c747eb128b3fb02b0d68417 587w, https://files.realpython.com/media/pycharm-conflict-resolving-window.eea8f79a12b2.png 1174w" sizes="75vw" alt="Conflict resolving windown in PyCharm"/></a></p>
1433 <p>On the left column, you will see your changes. On the right one, the changes made by your teammate. Finally, in the middle column, you will see the result. The conflicting lines are highlighted, and you can see a little <em>X</em> and <em>&gt;&gt;</em>/<em>&lt;&lt;</em> right beside those lines. Press the arrows to accept the changes and the <em>X</em> to decline. After you resolve all those conflicts, click the <em>Apply</em> button: </p>
1434 <p><a href="https://files.realpython.com/media/pycharm-resolving-conflicts.d3128ce78c45.gif" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-resolving-conflicts.d3128ce78c45.gif" width="1200" height="720" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-resolving-conflicts.d3128ce78c45.gif&amp;w=300&amp;sig=099dcca659431f9d2a1315b1fea5d7cbe246425c 300w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-resolving-conflicts.d3128ce78c45.gif&amp;w=600&amp;sig=6b98751300d8750c93d80cf7b65e96fb57d81714 600w, https://files.realpython.com/media/pycharm-resolving-conflicts.d3128ce78c45.gif 1200w" sizes="75vw" alt="Resolving Conflicts in PyCharm"/></a></p>
1435 <p>In the GIF above, for the first conflicting line, the author declined his own changes and accepted those of his teammates. Conversely, the author accepted his own changes and declined his teammates&rsquo; for the second conflicting line.</p>
1436 <p>There&rsquo;s a lot more that you can do with the VCS integration in PyCharm. For more details, see <a href="https://www.jetbrains.com/help/pycharm/version-control-integration.html">this documentation</a>.</p>
1437 <h2 id="using-plugins-and-external-tools-in-pycharm">Using Plugins and External Tools in PyCharm</h2>
1438 <p>You can find almost everything you need for development in PyCharm. If you can&rsquo;t, there is most probably a <a href="https://plugins.jetbrains.com/">plugin</a> that adds that functionality you need to PyCharm. For example, they can:</p>
1439 <ul>
1440 <li>Add support for various languages and frameworks </li>
1441 <li>Boost your productivity with shortcut hints, file watchers, and so on </li>
1442 <li>Help you learn a new programming language with coding exercises</li>
1443 </ul>
1444 <p>For instance, <a href="https://plugins.jetbrains.com/plugin/164-ideavim">IdeaVim</a> adds Vim emulation to PyCharm. If you like Vim, this can be a pretty good combination. </p>
1445 <p><a href="https://plugins.jetbrains.com/plugin/8006-material-theme-ui">Material Theme UI</a> changes the appearance of PyCharm to a Material Design look and feel: </p>
1446 <p><a href="https://files.realpython.com/media/pycharm-material-theme.178175815adc.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/pycharm-material-theme.178175815adc.png" width="1110" height="743" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-material-theme.178175815adc.png&amp;w=277&amp;sig=60ec4c8b5f6a89af345a230518e21ee8a33d174b 277w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-material-theme.178175815adc.png&amp;w=555&amp;sig=fdac8603145ed77eb443abcbbe8ebdb9811bdff4 555w, https://files.realpython.com/media/pycharm-material-theme.178175815adc.png 1110w" sizes="75vw" alt="Material Theme in PyCharm"/></a></p>
1447 <p><a href="https://plugins.jetbrains.com/plugin/9442-vue-js">Vue.js</a> adds support for <a href="https://vuejs.org/">Vue.js</a> projects. <a href="https://plugins.jetbrains.com/plugin/7793-markdown">Markdown</a> provides the capability to edit Markdown files within the IDE and see the rendered HTML in a live preview. You can find and install all of the available plugins by going to the <em>Preferences β†’ Plugins</em> on Mac or <em>Settings β†’ Plugins</em> on Windows or Linux, under the <em>Marketplace</em> tab:</p>
1448 <p><a href="https://files.realpython.com/media/pycharm-plugin-marketplace.7d1cecfdc8b3.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-plugin-marketplace.7d1cecfdc8b3.png" width="1047" height="687" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-plugin-marketplace.7d1cecfdc8b3.png&amp;w=261&amp;sig=8d4a9ba35b5eb27b5604f86108b79f28e40f3cc9 261w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-plugin-marketplace.7d1cecfdc8b3.png&amp;w=523&amp;sig=ec5e20ae96bf9ee37a1011bcc2203e7d1599b4a4 523w, https://files.realpython.com/media/pycharm-plugin-marketplace.7d1cecfdc8b3.png 1047w" sizes="75vw" alt="Plugin Marketplace in PyCharm"/></a></p>
1449 <p>If you can&rsquo;t find what you need, you can even <a href="http://www.jetbrains.org/intellij/sdk/docs/basics.html">develop your own plugin</a>.</p>
1450 <p>If you can&rsquo;t find the right plugin and don&rsquo;t want to develop your own because there&rsquo;s already a package in PyPI, then you can add it to PyCharm as an external tool. Take <a href="http://flake8.pycqa.org/en/latest/"><code>Flake8</code></a>, the code analyzer, as an example. </p>
1451 <p>First, install <code>flake8</code> in your virtualenv with <code>pip install flake8</code> in the Terminal app of your choice. You can also use the one integrated into PyCharm:</p>
1452 <p><a href="https://files.realpython.com/media/pycharm-terminal.bb20cae6697e.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/pycharm-terminal.bb20cae6697e.png" width="972" height="646" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-terminal.bb20cae6697e.png&amp;w=243&amp;sig=860217f31e60a4bb574e169ee05b6788cacaa388 243w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-terminal.bb20cae6697e.png&amp;w=486&amp;sig=26a30f328e42cc19e7e652c44767093480dbf352 486w, https://files.realpython.com/media/pycharm-terminal.bb20cae6697e.png 972w" sizes="75vw" alt="Terminal in PyCharm"/></a></p>
1453 <p>Then, go to <em>Preferences β†’ Tools</em> on Mac or <em>Settings β†’ Tools</em> on Windows/Linux, and then choose <em>External Tools</em>. Then click on the little <em>+</em> button at the bottom (1). In the new popup window, insert the details as shown below and click <em>OK</em> for both windows:</p>
1454 <p><a href="https://files.realpython.com/media/pycharm-flake8-tool.3963506224b4.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/pycharm-flake8-tool.3963506224b4.png" width="1082" height="720" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-flake8-tool.3963506224b4.png&amp;w=270&amp;sig=152f2ccf0a75a950b6a5cd6b5087507b66288595 270w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-flake8-tool.3963506224b4.png&amp;w=541&amp;sig=ee7221ee226076dfeaedfd5d2756263e351d8d14 541w, https://files.realpython.com/media/pycharm-flake8-tool.3963506224b4.png 1082w" sizes="75vw" alt="Flake8 tool in PyCharm"/></a></p>
1455 <p>Here, <em>Program</em> (2) refers to the Flake8 executable that can be found in the folder <em>/bin</em> of your virtual environment. <em>Arguments</em> (3) refers to which file you want to analyze with the help of Flake8. <em>Working directory</em> is the directory of your project.</p>
1456 <p>You could hardcode the absolute paths for everything here, but that would mean that you couldn&rsquo;t use this external tool in other projects. You would be able to use it only inside one project for one file. </p>
1457 <p>So you need to use something called <em>Macros</em>. Macros are basically variables in the format of <code>$name$</code> that change according to your context. For example, <code>$FileName$</code> is <code>first.py</code> when you&rsquo;re editing <code>first.py</code>, and it is <code>second.py</code> when you&rsquo;re editing <code>second.py</code>. You can see their list and insert any of them by clicking on the <em>Insert Macro&hellip;</em> buttons. Because you used macros here, the values will change according to the project you&rsquo;re currently working on, and Flake8 will continue to do its job properly.   </p>
1458 <p>In order to use it, create a file <code>example.py</code> and put the following code in it:</p>
1459 <div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="n">CONSTANT_VAR</span> <span class="o">=</span> <span class="mi">1</span>
1460 <span class="lineno"> 2 </span>
1461 <span class="lineno"> 3 </span>
1462 <span class="lineno"> 4 </span>
1463 <span class="lineno"> 5 </span><span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
1464 <span class="lineno"> 6 </span>    <span class="n">c</span> <span class="o">=</span> <span class="s2">&quot;hello&quot;</span>
1465 <span class="lineno"> 7 </span>    <span class="k">return</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
1466 </pre></div>
1467 
1468 <p>It deliberately breaks some of the Flake8 rules. Right-click the background of this file. Choose <em>External Tools</em> and then <em>Flake8</em>. VoilΓ ! The output of the Flake8 analysis will appear at the bottom: </p>
1469 <p><a href="https://files.realpython.com/media/pycharm-flake8-output.5b78e911e6d3.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/pycharm-flake8-output.5b78e911e6d3.png" width="997" height="634" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-flake8-output.5b78e911e6d3.png&amp;w=249&amp;sig=8fecbaaf9d4e2daa2bbe443be4b6dee2634f2a46 249w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-flake8-output.5b78e911e6d3.png&amp;w=498&amp;sig=4e41c6dbac28fbe23c18716f49c3cadf63767500 498w, https://files.realpython.com/media/pycharm-flake8-output.5b78e911e6d3.png 997w" sizes="75vw" alt="Flake8 Output in PyCharm"/></a></p>
1470 <p>In order to make it even better, you can add a shortcut for it. Go to <em>Preferences</em> on Mac or to <em>Settings</em> on Windows or Linux. Then, go to <em>Keymap β†’ External Tools β†’ External Tools</em>. Double-click <em>Flake8</em> and choose <em>Add Keyboard Shortcut</em>. You&rsquo;ll see this window:</p>
1471 <p><a href="https://files.realpython.com/media/pycharm-add-shortcut.8c66b2bd12c0.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/pycharm-add-shortcut.8c66b2bd12c0.png" width="1084" height="724" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-add-shortcut.8c66b2bd12c0.png&amp;w=271&amp;sig=e95cc32634af125588e6881ec6992dace79ec667 271w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pycharm-add-shortcut.8c66b2bd12c0.png&amp;w=542&amp;sig=836cbeec5256df8e8048aa6a0c3e7d89b2bac444 542w, https://files.realpython.com/media/pycharm-add-shortcut.8c66b2bd12c0.png 1084w" sizes="75vw" alt="Add shortcut in PyCharm"/></a></p>
1472 <p>In the image above, the shortcut is <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-alt">Alt</kbd><span>+</span><kbd class="key-a">A</kbd></span> for this tool. Add your preferred shortcut in the textbox and click <em>OK</em> for both windows. Now you can now use that shortcut to analyze the file you&rsquo;re currently working on with Flake8.</p>
1473 <h2 id="pycharm-professional-features">PyCharm Professional Features</h2>
1474 <p>PyCharm Professional is a paid version of PyCharm with more out-of-the-box features and integrations. In this section, you&rsquo;ll mainly be presented with overviews of its main features and links to the official documentation, where each feature is discussed in detail. Remember that none of the following features is available in the Community edition. </p>
1475 <h3 id="django-support">Django Support</h3>
1476 <p>PyCharm has extensive support for <a href="https://www.djangoproject.com/">Django</a>, one of the most popular and beloved <a href="https://realpython.com/learning-paths/become-python-web-developer/">Python web frameworks</a>. To make sure that it&rsquo;s enabled, do the following:</p>
1477 <ol>
1478 <li>Open <em>Preferences</em> on Mac or <em>Settings</em> on Windows or Linux.</li>
1479 <li>Choose <em>Languages and Frameworks</em>.</li>
1480 <li>Choose <em>Django</em>.</li>
1481 <li>Check the checkbox <em>Enable Django support</em>.</li>
1482 <li>Apply changes.</li>
1483 </ol>
1484 <p>Now that you&rsquo;ve enabled Django support, your Django development journey will be a lot easier in PyCharm:</p>
1485 <ul>
1486 <li>When creating a project, you&rsquo;ll have a dedicated Django project type. This means that, when you choose this type, you&rsquo;ll have all the necessary files and settings. This is the equivalent of using <code>django-admin startproject mysite</code>.</li>
1487 <li>You can run <code>manage.py</code> commands directly inside PyCharm. </li>
1488 <li>Django templates are supported, including:<ul>
1489 <li>Syntax and error highlighting</li>
1490 <li>Code completion</li>
1491 <li>Navigation</li>
1492 <li>Completion for block names</li>
1493 <li>Completion for custom tags and filters</li>
1494 <li>Quick documentation for tags and filters</li>
1495 <li>Capability to debug them</li>
1496 </ul>
1497 </li>
1498 <li>Code completion in all other Django parts such as views, URLs and models, and code insight support for Django ORM.</li>
1499 <li>Model dependency diagrams for Django models.</li>
1500 </ul>
1501 <p>For more details on Django support, see the <a href="https://www.jetbrains.com/help/pycharm/django-support7.html">official documentation</a>.</p>
1502 <h3 id="database-support">Database Support</h3>
1503 <p>Modern database development is a complex task with many supporting systems and workflows. That&rsquo;s why JetBrains, the company behind PyCharm, developed a standalone IDE called <a href="https://www.jetbrains.com/datagrip/">DataGrip</a> for that. It&rsquo;s a separate product from PyCharm with a separate license. </p>
1504 <p>Luckily, PyCharm supports all the features that are available in DataGrip through a plugin called <em>Database tools and SQL</em>, which is enabled by default. With the help of it, you can query, create and manage databases whether they&rsquo;re working locally, on a server, or in the cloud. The plugin supports MySQL, PostgreSQL, Microsoft SQL Server, SQLite, MariaDB, Oracle, Apache Cassandra, and others. For more information on what you can do with this plugin, check out <a href="https://www.jetbrains.com/help/pycharm/relational-databases.html">the comprehensive documentation on the database support</a>.</p>
1505 <h3 id="thread-concurrency-visualization">Thread Concurrency Visualization</h3>
1506 <p><a href="https://channels.readthedocs.io/en/latest/"><code>Django Channels</code></a>, <a href="https://realpython.com/async-io-python/"><code>asyncio</code></a>, and the recent frameworks like <a href="https://www.starlette.io/"><code>Starlette</code></a> are examples of a growing trend in asynchronous Python programming. While it&rsquo;s true that asynchronous programs do bring a lot of benefits to the table, it&rsquo;s also notoriously hard to write and debug them. In such cases, <em>Thread Concurrency Visualization</em> can be just what the doctor ordered because it helps you take full control over your multi-threaded applications and optimize them.</p>
1507 <p>Check out <a href="https://www.jetbrains.com/help/pycharm/thread-concurrency-visualization.html">the comprehensive documentation of this feature</a> for more details.</p>
1508 <h3 id="profiler">Profiler</h3>
1509 <p>Speaking of optimization, profiling is another technique that you can use to optimize your code. With its help, you can see which parts of your code are taking most of the execution time. A profiler runs in the following order of priority: </p>
1510 <ol>
1511 <li><a href="https://vmprof.readthedocs.io/en/latest/"><code>vmprof</code></a> </li>
1512 <li><a href="https://github.com/sumerc/yappi"><code>yappi</code></a></li>
1513 <li><a href="https://docs.python.org/3/library/profile.html"><code>cProfile</code></a></li>
1514 </ol>
1515 <p>If you don&rsquo;t have <code>vmprof</code> or <code>yappi</code> installed, then it&rsquo;ll fall back to the standard <code>cProfile</code>. It&rsquo;s <a href="https://www.jetbrains.com/help/pycharm/profiler.html">well-documented</a>, so I won&rsquo;t rehash it here. </p>
1516 <h3 id="scientific-mode">Scientific Mode</h3>
1517 <p>Python is not only a language for general and web programming. It also emerged as the best tool for data science and machine learning over these last years thanks to libraries and tools like <a href="http://www.numpy.org/">NumPy</a>, <a href="https://www.scipy.org/">SciPy</a>, <a href="https://scikit-learn.org/">scikit-learn</a>, <a href="https://matplotlib.org/">Matplotlib</a>, <a href="https://jupyter.org/">Jupyter</a>, and more. With such powerful libraries available, you need a powerful IDE to support all the functions such as graphing and analyzing those libraries have. PyCharm provides everything you need as <a href="https://www.jetbrains.com/help/pycharm/matplotlib-support.html">thoroughly documented here</a>.  </p>
1518 <h3 id="remote-development">Remote Development</h3>
1519 <p>One common cause of bugs in many applications is that development and production environments differ. Although, in most cases, it&rsquo;s not possible to provide an exact copy of the production environment for development, pursuing it is a worthy goal.</p>
1520 <p>With PyCharm, you can debug your application using an interpreter that is located on the other computer, such as a Linux VM. As a result, you can have the same interpreter as your production environment to fix and avoid many bugs resulting from the difference between development and production environments. Make sure to check out the <a href="https://www.jetbrains.com/help/pycharm/remote-debugging-with-product.html">official documentation</a> to learn more.</p>
1521 <h2 id="conclusion">Conclusion</h2>
1522 <p>PyCharm is one of best, if not the best, full-featured, dedicated, and versatile IDEs for Python development. It offers a ton of benefits, saving you a lot of time by helping you with routine tasks. Now you know how to be productive with it!</p>
1523 <p>In this article, you learned about a lot, including:</p>
1524 <ul>
1525 <li>Installing PyCharm</li>
1526 <li>Writing code in PyCharm</li>
1527 <li>Running your code in PyCharm</li>
1528 <li>Debugging and testing your code in PyCharm</li>
1529 <li>Editing an existing project in PyCharm</li>
1530 <li>Searching and navigating in PyCharm</li>
1531 <li>Using Version Control in PyCharm</li>
1532 <li>Using Plugins and External Tools in PyCharm</li>
1533 <li>Using PyCharm Professional features, such as Django support and Scientific mode</li>
1534 </ul>
1535 <p>If there&rsquo;s anything you&rsquo;d like to ask or share, please reach out in the comments below. There&rsquo;s also a lot more information at the <a href="https://www.jetbrains.com/pycharm/documentation/">PyCharm website</a> for you to explore.</p>
1536 <div class="alert alert-warning" role="alert"><p><strong>Clone Repo:</strong> <a href="https://realpython.com/optins/view/alcazar-web-framework/" class="alert-link" data-toggle="modal" data-target="#modal-alcazar-web-framework" data-focus="false">Click here to clone the repo you'll use</a> to explore the project-focused features of PyCharm in this tutorial.</p></div>
1537         <hr />
1538         <p><em>[ Improve Your Python With 🐍 Python Tricks πŸ’Œ – Get a short &amp; sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&amp;utm_medium=rss&amp;utm_campaign=footer">&gt;&gt; Click here to learn more and see examples</a> ]</em></p>#
1539 datePublished: #Wed Aug 28 14:00:00 2019#
1540 dateUpdated: #Wed Aug 28 14:00:00 2019#
1541 # Person begin ####################
1542 name: #Real Python#
1543 # Person end ######################
1544 # Item end ########################
1545 # Item begin ######################
1546 id: #https://realpython.com/courses/python-lambda-functions/#
1547 title: #How to Use Python Lambda Functions#
1548 link: #https://realpython.com/courses/python-lambda-functions/#
1549 description: #In this step-by-step course, you&apos;ll learn about Python lambda functions. You&apos;ll see how they compare with regular functions and how you can use them in accordance with best practices.#
1550 content: #<p>Python and other languages like Java, C#, and even C++ have had lambda functions added to their syntax, whereas languages like LISP or the ML family of languages, Haskell, OCaml, and F#, use lambdas as a core concept. Python lambdas are little, anonymous functions, subject to a more restrictive but more concise syntax than regular Python functions.</p>
1551 <p><strong>By the end of this course, you&rsquo;ll know:</strong></p>
1552 <ul>
1553 <li>How Python lambdas came to be </li>
1554 <li>How lambdas compare with regular function objects</li>
1555 <li>How to write lambda functions</li>
1556 <li>Which functions in the Python standard library leverage lambdas</li>
1557 <li>When to use or avoid Python lambda functions</li>
1558 </ul>
1559 <p>This course is mainly for intermediate to experienced Python programmers, but it is accessible to any curious minds with interest in programming. All the examples included in this tutorial have been tested with Python 3.7.</p>
1560         <hr />
1561         <p><em>[ Improve Your Python With 🐍 Python Tricks πŸ’Œ – Get a short &amp; sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&amp;utm_medium=rss&amp;utm_campaign=footer">&gt;&gt; Click here to learn more and see examples</a> ]</em></p>#
1562 datePublished: #Tue Aug 27 14:00:00 2019#
1563 dateUpdated: #Tue Aug 27 14:00:00 2019#
1564 # Person begin ####################
1565 name: #Real Python#
1566 # Person end ######################
1567 # Item end ########################
1568 # Item begin ######################
1569 id: #https://realpython.com/openpyxl-excel-spreadsheets-python/#
1570 title: #A Guide to Excel Spreadsheets in Python With openpyxl#
1571 link: #https://realpython.com/openpyxl-excel-spreadsheets-python/#
1572 description: #In this step-by-step tutorial, you&apos;ll learn how to handle spreadsheets in Python using the openpyxl package. You&apos;ll learn how to manipulate Excel spreadsheets, extract information from spreadsheets, create simple or more complex spreadsheets, including adding styles, charts, and so on.#
1573 content: #<p>Excel spreadsheets are one of those things you might have to deal with at some point. Either it&rsquo;s because your boss loves them or because marketing needs them, you might have to learn how to work with spreadsheets, and that&rsquo;s when knowing <code>openpyxl</code> comes in handy!</p>
1574 <p>Spreadsheets are a very intuitive and user-friendly way to manipulate large datasets without any prior technical background. That&rsquo;s why they&rsquo;re still so commonly used today.</p>
1575 <p><strong>In this article, you&rsquo;ll learn how to use openpyxl to:</strong></p>
1576 <ul>
1577 <li>Manipulate Excel spreadsheets with confidence</li>
1578 <li>Extract information from spreadsheets</li>
1579 <li>Create simple or more complex spreadsheets, including adding styles, charts, and so on</li>
1580 </ul>
1581 <p>This article is written for intermediate developers who have a pretty good knowledge of Python data structures, such as <a href="https://realpython.com/python-dicts/">dicts</a> and <a href="https://realpython.com/python-lists-tuples/">lists</a>, but also feel comfortable around <a href="https://realpython.com/python3-object-oriented-programming/">OOP</a> and more intermediate level topics.</p>
1582 <div class="alert alert-warning" role="alert"><p><strong>Download Dataset:</strong> <a href="https://realpython.com/optins/view/openpyxl-sample-dataset/" class="alert-link" data-toggle="modal" data-target="#modal-openpyxl-sample-dataset" data-focus="false">Click here to download the dataset for the openpyxl exercise you'll be following in this tutorial.</a></p></div>
1583 
1584 <h2 id="before-you-begin">Before You Begin</h2>
1585 <p>If you ever get asked to extract some data from a database or log file into an Excel spreadsheet, or if you often have to convert an Excel spreadsheet into some more usable programmatic form, then this tutorial is perfect for you. Let&rsquo;s jump into the <code>openpyxl</code> caravan!</p>
1586 <h3 id="practical-use-cases">Practical Use Cases</h3>
1587 <p>First things first, when would you need to use a package like <code>openpyxl</code> in a real-world scenario? You&rsquo;ll see a few examples below, but really, there are hundreds of possible scenarios where this knowledge could come in handy.</p>
1588 <h4 id="importing-new-products-into-a-database">Importing New Products Into a Database</h4>
1589 <p>You are responsible for tech in an online store company, and your boss doesn&rsquo;t want to pay for a cool and expensive CMS system.</p>
1590 <p>Every time they want to add new products to the online store, they come to you with an Excel spreadsheet with a few hundred rows and, for each of them, you have the product name, description, price, and so forth.</p>
1591 <p>Now, to import the data, you&rsquo;ll have to iterate over each spreadsheet row and add each product to the online store.</p>
1592 <h4 id="exporting-database-data-into-a-spreadsheet">Exporting Database Data Into a Spreadsheet</h4>
1593 <p>Say you have a Database table where you record all your users&rsquo; information, including name, phone number, email address, and so forth.</p>
1594 <p>Now, the Marketing team wants to contact all users to give them some discounted offer or promotion. However, they don&rsquo;t have access to the Database, or they don&rsquo;t know how to use SQL to extract that information easily.</p>
1595 <p>What can you do to help? Well, you can make a quick script using <code>openpyxl</code> that iterates over every single User record and puts all the essential information into an Excel spreadsheet.</p>
1596 <p>That&rsquo;s gonna earn you an extra slice of cake at your company&rsquo;s next birthday party!</p>
1597 <h4 id="appending-information-to-an-existing-spreadsheet">Appending Information to an Existing Spreadsheet</h4>
1598 <p>You may also have to open a spreadsheet, read the information in it and, according to some business logic, append more data to it.</p>
1599 <p>For example, using the online store scenario again, say you get an Excel spreadsheet with a list of users and you need to append to each row the total amount they&rsquo;ve spent in your store.</p>
1600 <p>This data is in the Database and, in order to do this, you have to read the spreadsheet, iterate through each row, fetch the total amount spent from the Database and then write back to the spreadsheet.</p>
1601 <p>Not a problem for <code>openpyxl</code>!</p>
1602 <h3 id="learning-some-basic-excel-terminology">Learning Some Basic Excel Terminology</h3>
1603 <p>Here&rsquo;s a quick list of basic terms you&rsquo;ll see when you&rsquo;re working with Excel spreadsheets:</p>
1604 <div class="table-responsive">
1605 <table class="table table-hover">
1606 <thead>
1607 <tr>
1608 <th>Term</th>
1609 <th>Explanation</th>
1610 </tr>
1611 </thead>
1612 <tbody>
1613 <tr>
1614 <td>Spreadsheet or Workbook</td>
1615 <td>A <strong>Spreadsheet</strong> is the main file you are creating or working with.</td>
1616 </tr>
1617 <tr>
1618 <td>Worksheet or Sheet</td>
1619 <td>A <strong>Sheet</strong> is used to split different kinds of content within the same spreadsheet. A <strong>Spreadsheet</strong> can have one or more <strong>Sheets</strong>.</td>
1620 </tr>
1621 <tr>
1622 <td>Column</td>
1623 <td>A <strong>Column</strong> is a vertical line, and it&rsquo;s represented by an uppercase letter: <em>A</em>.</td>
1624 </tr>
1625 <tr>
1626 <td>Row</td>
1627 <td>A <strong>Row</strong> is a horizontal line, and it&rsquo;s represented by a number: <em>1</em>.</td>
1628 </tr>
1629 <tr>
1630 <td>Cell</td>
1631 <td>A <strong>Cell</strong> is a combination of <strong>Column</strong> and <strong>Row</strong>, represented by both an uppercase letter and a number: <em>A1</em>.</td>
1632 </tr>
1633 </tbody>
1634 </table>
1635 </div>
1636 <h3 id="getting-started-with-openpyxl">Getting Started With openpyxl</h3>
1637 <p>Now that you&rsquo;re aware of the benefits of a tool like <code>openpyxl</code>, let&rsquo;s get down to it and start by installing the package. For this tutorial, you should use Python 3.7 and openpyxl 2.6.2. To install the package, you can do the following:</p>
1638 <div class="highlight sh"><pre><span></span><span class="gp">$</span> pip install openpyxl
1639 </pre></div>
1640 
1641 <p>After you install the package, you should be able to create a super simple spreadsheet with the following code:</p>
1642 <div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">Workbook</span>
1643 
1644 <span class="n">workbook</span> <span class="o">=</span> <span class="n">Workbook</span><span class="p">()</span>
1645 <span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
1646 
1647 <span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;A1&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="s2">&quot;hello&quot;</span>
1648 <span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;B1&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="s2">&quot;world!&quot;</span>
1649 
1650 <span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;hello_world.xlsx&quot;</span><span class="p">)</span>
1651 </pre></div>
1652 
1653 <p>The code above should create a file called <code>hello_world.xlsx</code> in the folder you are using to run the code. If you open that file with Excel you should see something like this:</p>
1654 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_16.54.45.e646867e4dbb.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_16.54.45.e646867e4dbb.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_16.54.45.e646867e4dbb.png&amp;w=540&amp;sig=4c3acdcf35f528b6ed0cf6e299c2575781934414 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_16.54.45.e646867e4dbb.png&amp;w=1080&amp;sig=328d4ff12cec767d684f5b7666380d9f23a2a548 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_16.54.45.e646867e4dbb.png 2160w" sizes="75vw" alt="A Simple Hello World Spreadsheet"/></a></p>
1655 <p><em>Woohoo</em>, your first spreadsheet created!</p>
1656 <h2 id="reading-excel-spreadsheets-with-openpyxl">Reading Excel Spreadsheets With openpyxl</h2>
1657 <p>Let&rsquo;s start with the most essential thing one can do with a spreadsheet: read it.</p>
1658 <p>You&rsquo;ll go from a straightforward approach to reading a spreadsheet to more complex examples where you read the data and convert it into more useful Python structures.</p>
1659 <h3 id="dataset-for-this-tutorial">Dataset for This Tutorial</h3>
1660 <p>Before you dive deep into some code examples, you should <strong>download this sample dataset</strong> and store it somewhere as <code>sample.xlsx</code>:</p>
1661 <div class="alert alert-warning" role="alert"><p><strong>Download Dataset:</strong> <a href="https://realpython.com/optins/view/openpyxl-sample-dataset/" class="alert-link" data-toggle="modal" data-target="#modal-openpyxl-sample-dataset" data-focus="false">Click here to download the dataset for the openpyxl exercise you'll be following in this tutorial.</a></p></div>
1662 
1663 <p>This is one of the datasets you&rsquo;ll be using throughout this tutorial, and it&rsquo;s a spreadsheet with a sample of real data from Amazon&rsquo;s online product reviews. This dataset is only a tiny fraction of what Amazon <a href="https://registry.opendata.aws/amazon-reviews/">provides</a>, but for testing purposes, it&rsquo;s more than enough.</p>
1664 <h3 id="a-simple-approach-to-reading-an-excel-spreadsheet">A Simple Approach to Reading an Excel Spreadsheet</h3>
1665 <p>Finally, let&rsquo;s start reading some spreadsheets! To begin with, open our sample spreadsheet:</p>
1666 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">load_workbook</span>
1667 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span> <span class="o">=</span> <span class="n">load_workbook</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;sample.xlsx&quot;</span><span class="p">)</span>
1668 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
1669 <span class="go">[&#39;Sheet 1&#39;]</span>
1670 
1671 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
1672 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span>
1673 <span class="go">&lt;Worksheet &quot;Sheet 1&quot;&gt;</span>
1674 
1675 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">title</span>
1676 <span class="go">&#39;Sheet 1&#39;</span>
1677 </pre></div>
1678 
1679 <p>In the code above, you first open the spreadsheet <code>sample.xlsx</code> using <code>load_workbook()</code>, and then you can use <code>workbook.sheetnames</code> to see all the sheets you have available to work with. After that,  <code>workbook.active</code> selects the first available sheet and, in this case, you can see that it selects <strong>Sheet 1</strong> automatically. Using these methods is the default way of opening a spreadsheet, and you&rsquo;ll see it many times during this tutorial.</p>
1680 <p>Now, after opening a spreadsheet, you can easily retrieve data from it like this:</p>
1681 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;A1&quot;</span><span class="p">]</span>
1682 <span class="go">&lt;Cell &#39;Sheet 1&#39;.A1&gt;</span>
1683 
1684 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;A1&quot;</span><span class="p">]</span><span class="o">.</span><span class="n">value</span>
1685 <span class="go">&#39;marketplace&#39;</span>
1686 
1687 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;F10&quot;</span><span class="p">]</span><span class="o">.</span><span class="n">value</span>
1688 <span class="go">&quot;G-Shock Men&#39;s Grey Sport Watch&quot;</span>
1689 </pre></div>
1690 
1691 <p>To return the actual value of a cell, you need to do <code>.value</code>. Otherwise, you&rsquo;ll get the main <code>Cell</code> object. You can also use the method <code>.cell()</code> to retrieve a cell using index notation. Remember to add <code>.value</code> to get the actual value and not a <code>Cell</code> object:</p>
1692 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">cell</span><span class="p">(</span><span class="n">row</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">6</span><span class="p">)</span>
1693 <span class="go">&lt;Cell &#39;Sheet 1&#39;.F10&gt;</span>
1694 
1695 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">cell</span><span class="p">(</span><span class="n">row</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">6</span><span class="p">)</span><span class="o">.</span><span class="n">value</span>
1696 <span class="go">&quot;G-Shock Men&#39;s Grey Sport Watch&quot;</span>
1697 </pre></div>
1698 
1699 <p>You can see that the results returned are the same, no matter which way you decide to go with. However, in this tutorial, you&rsquo;ll be mostly using the first approach: <code>["A1"]</code>.</p>
1700 <div class="alert alert-primary" role="alert">
1701 <p><strong>Note:</strong> Even though in Python you&rsquo;re used to a zero-indexed notation, with spreadsheets you&rsquo;ll always use a one-indexed notation where the first row or column always has index <code>1</code>.</p>
1702 </div>
1703 <p>The above shows you the quickest way to open a spreadsheet. However, you can pass additional parameters to change the way a spreadsheet is loaded.</p>
1704 <h4 id="additional-reading-options">Additional Reading Options</h4>
1705 <p>There are a few arguments you can pass to <code>load_workbook()</code> that change the way a spreadsheet is loaded. The most important ones are the following two Booleans:</p>
1706 <ol>
1707 <li><strong>read_only</strong> loads a spreadsheet in read-only mode allowing you to open very large Excel files.</li>
1708 <li><strong>data_only</strong> ignores loading formulas and instead loads only the resulting values.</li>
1709 </ol>
1710 <h3 id="importing-data-from-a-spreadsheet">Importing Data From a Spreadsheet</h3>
1711 <p>Now that you&rsquo;ve learned the basics about loading a spreadsheet, it&rsquo;s about time you get to the fun part: <strong>the iteration and actual usage of the values within the spreadsheet</strong>.</p>
1712 <p>This section is where you&rsquo;ll learn all the different ways you can iterate through the data, but also how to convert that data into something usable and, more importantly, how to do it in a Pythonic way.</p>
1713 <h4 id="iterating-through-the-data">Iterating Through the Data</h4>
1714 <p>There are a few different ways you can iterate through the data depending on your needs.</p>
1715 <p>You can slice the data with a combination of columns and rows:</p>
1716 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;A1:C2&quot;</span><span class="p">]</span>
1717 <span class="go">((&lt;Cell &#39;Sheet 1&#39;.A1&gt;, &lt;Cell &#39;Sheet 1&#39;.B1&gt;, &lt;Cell &#39;Sheet 1&#39;.C1&gt;),</span>
1718 <span class="go"> (&lt;Cell &#39;Sheet 1&#39;.A2&gt;, &lt;Cell &#39;Sheet 1&#39;.B2&gt;, &lt;Cell &#39;Sheet 1&#39;.C2&gt;))</span>
1719 </pre></div>
1720 
1721 <p>You can get ranges of rows or columns:</p>
1722 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="c1"># Get all cells from column A</span>
1723 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;A&quot;</span><span class="p">]</span>
1724 <span class="go">(&lt;Cell &#39;Sheet 1&#39;.A1&gt;,</span>
1725 <span class="go"> &lt;Cell &#39;Sheet 1&#39;.A2&gt;,</span>
1726 <span class="go"> ...</span>
1727 <span class="go"> &lt;Cell &#39;Sheet 1&#39;.A99&gt;,</span>
1728 <span class="go"> &lt;Cell &#39;Sheet 1&#39;.A100&gt;)</span>
1729 
1730 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Get all cells for a range of columns</span>
1731 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;A:B&quot;</span><span class="p">]</span>
1732 <span class="go">((&lt;Cell &#39;Sheet 1&#39;.A1&gt;,</span>
1733 <span class="go">  &lt;Cell &#39;Sheet 1&#39;.A2&gt;,</span>
1734 <span class="go">  ...</span>
1735 <span class="go">  &lt;Cell &#39;Sheet 1&#39;.A99&gt;,</span>
1736 <span class="go">  &lt;Cell &#39;Sheet 1&#39;.A100&gt;),</span>
1737 <span class="go"> (&lt;Cell &#39;Sheet 1&#39;.B1&gt;,</span>
1738 <span class="go">  &lt;Cell &#39;Sheet 1&#39;.B2&gt;,</span>
1739 <span class="go">  ...</span>
1740 <span class="go">  &lt;Cell &#39;Sheet 1&#39;.B99&gt;,</span>
1741 <span class="go">  &lt;Cell &#39;Sheet 1&#39;.B100&gt;))</span>
1742 
1743 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Get all cells from row 5</span>
1744 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span>
1745 <span class="go">(&lt;Cell &#39;Sheet 1&#39;.A5&gt;,</span>
1746 <span class="go"> &lt;Cell &#39;Sheet 1&#39;.B5&gt;,</span>
1747 <span class="go"> ...</span>
1748 <span class="go"> &lt;Cell &#39;Sheet 1&#39;.N5&gt;,</span>
1749 <span class="go"> &lt;Cell &#39;Sheet 1&#39;.O5&gt;)</span>
1750 
1751 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Get all cells for a range of rows</span>
1752 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="mi">5</span><span class="p">:</span><span class="mi">6</span><span class="p">]</span>
1753 <span class="go">((&lt;Cell &#39;Sheet 1&#39;.A5&gt;,</span>
1754 <span class="go">  &lt;Cell &#39;Sheet 1&#39;.B5&gt;,</span>
1755 <span class="go">  ...</span>
1756 <span class="go">  &lt;Cell &#39;Sheet 1&#39;.N5&gt;,</span>
1757 <span class="go">  &lt;Cell &#39;Sheet 1&#39;.O5&gt;),</span>
1758 <span class="go"> (&lt;Cell &#39;Sheet 1&#39;.A6&gt;,</span>
1759 <span class="go">  &lt;Cell &#39;Sheet 1&#39;.B6&gt;,</span>
1760 <span class="go">  ...</span>
1761 <span class="go">  &lt;Cell &#39;Sheet 1&#39;.N6&gt;,</span>
1762 <span class="go">  &lt;Cell &#39;Sheet 1&#39;.O6&gt;))</span>
1763 </pre></div>
1764 
1765 <p>You&rsquo;ll notice that all of the above examples return a <code>tuple</code>. If you want to refresh your memory on how to handle <code>tuples</code> in Python, check out the article on <a href="https://realpython.com/python-lists-tuples/#python-tuples">Lists and Tuples in Python</a>.</p>
1766 <p>There are also multiple ways of using normal Python <a href="https://realpython.com/introduction-to-python-generators/">generators</a> to go through the data. The main methods you can use to achieve this are:</p>
1767 <ul>
1768 <li><code>.iter_rows()</code></li>
1769 <li><code>.iter_cols()</code></li>
1770 </ul>
1771 <p>Both methods can receive the following arguments:</p>
1772 <ul>
1773 <li><code>min_row</code></li>
1774 <li><code>max_row</code></li>
1775 <li><code>min_col</code></li>
1776 <li><code>max_col</code></li>
1777 </ul>
1778 <p>These arguments are used to set boundaries for the iteration:</p>
1779 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
1780 <span class="gp">... </span>                           <span class="n">max_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
1781 <span class="gp">... </span>                           <span class="n">min_col</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
1782 <span class="gp">... </span>                           <span class="n">max_col</span><span class="o">=</span><span class="mi">3</span><span class="p">):</span>
1783 <span class="gp">... </span>    <span class="nb">print</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
1784 <span class="go">(&lt;Cell &#39;Sheet 1&#39;.A1&gt;, &lt;Cell &#39;Sheet 1&#39;.B1&gt;, &lt;Cell &#39;Sheet 1&#39;.C1&gt;)</span>
1785 <span class="go">(&lt;Cell &#39;Sheet 1&#39;.A2&gt;, &lt;Cell &#39;Sheet 1&#39;.B2&gt;, &lt;Cell &#39;Sheet 1&#39;.C2&gt;)</span>
1786 
1787 
1788 <span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">column</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_cols</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
1789 <span class="gp">... </span>                              <span class="n">max_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
1790 <span class="gp">... </span>                              <span class="n">min_col</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
1791 <span class="gp">... </span>                              <span class="n">max_col</span><span class="o">=</span><span class="mi">3</span><span class="p">):</span>
1792 <span class="gp">... </span>    <span class="nb">print</span><span class="p">(</span><span class="n">column</span><span class="p">)</span>
1793 <span class="go">(&lt;Cell &#39;Sheet 1&#39;.A1&gt;, &lt;Cell &#39;Sheet 1&#39;.A2&gt;)</span>
1794 <span class="go">(&lt;Cell &#39;Sheet 1&#39;.B1&gt;, &lt;Cell &#39;Sheet 1&#39;.B2&gt;)</span>
1795 <span class="go">(&lt;Cell &#39;Sheet 1&#39;.C1&gt;, &lt;Cell &#39;Sheet 1&#39;.C2&gt;)</span>
1796 </pre></div>
1797 
1798 <p>You&rsquo;ll notice that in the first example, when iterating through the rows using <code>.iter_rows()</code>, you get one <code>tuple</code> element per row selected. While when using <code>.iter_cols()</code> and iterating through columns, you&rsquo;ll get one <code>tuple</code> per column instead.</p>
1799 <p>One additional argument you can pass to both methods is the Boolean <code>values_only</code>. When it&rsquo;s set to <code>True</code>, the values of the cell are returned, instead of the <code>Cell</code> object:</p>
1800 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
1801 <span class="gp">... </span>                             <span class="n">max_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
1802 <span class="gp">... </span>                             <span class="n">min_col</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
1803 <span class="gp">... </span>                             <span class="n">max_col</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span>
1804 <span class="gp">... </span>                             <span class="n">values_only</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
1805 <span class="gp">... </span>    <span class="nb">print</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
1806 <span class="go">(&#39;marketplace&#39;, &#39;customer_id&#39;, &#39;review_id&#39;)</span>
1807 <span class="go">(&#39;US&#39;, 3653882, &#39;R3O9SGZBVQBV76&#39;)</span>
1808 </pre></div>
1809 
1810 <p>If you want to iterate through the whole dataset, then you can also use the attributes <code>.rows</code> or <code>.columns</code> directly, which are shortcuts to using <code>.iter_rows()</code> and <code>.iter_cols()</code> without any arguments:</p>
1811 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">rows</span><span class="p">:</span>
1812 <span class="gp">... </span>    <span class="nb">print</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
1813 <span class="go">(&lt;Cell &#39;Sheet 1&#39;.A1&gt;, &lt;Cell &#39;Sheet 1&#39;.B1&gt;, &lt;Cell &#39;Sheet 1&#39;.C1&gt;</span>
1814 <span class="gp">...</span>
1815 <span class="go">&lt;Cell &#39;Sheet 1&#39;.M100&gt;, &lt;Cell &#39;Sheet 1&#39;.N100&gt;, &lt;Cell &#39;Sheet 1&#39;.O100&gt;)</span>
1816 </pre></div>
1817 
1818 <p>These shortcuts are very useful when you&rsquo;re iterating through the whole dataset.</p>
1819 <h4 id="manipulate-data-using-pythons-default-data-structures">Manipulate Data Using Python&rsquo;s Default Data Structures</h4>
1820 <p>Now that you know the basics of iterating through the data in a workbook, let&rsquo;s look at smart ways of converting that data into Python structures.</p>
1821 <p>As you saw earlier, the result from all iterations comes in the form of <code>tuples</code>. However, since a <code>tuple</code> is nothing more than an immutable <code>list</code>, you can easily access its data and transform it into other structures.</p>
1822 <p>For example, say you want to extract product information from the <code>sample.xlsx</code> spreadsheet and into a dictionary where each key is a product ID.</p>
1823 <p>A straightforward way to do this is to iterate over all the rows, pick the columns you know are related to product information, and then store that in a dictionary. Let&rsquo;s code this out!</p>
1824 <p>First of all, have a look at the headers and see what information you care most about:</p>
1825 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
1826 <span class="gp">... </span>                             <span class="n">max_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
1827 <span class="gp">... </span>                             <span class="n">values_only</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
1828 <span class="gp">... </span>    <span class="nb">print</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
1829 <span class="go">(&#39;marketplace&#39;, &#39;customer_id&#39;, &#39;review_id&#39;, &#39;product_id&#39;, ...)</span>
1830 </pre></div>
1831 
1832 <p>This code returns a list of all the column names you have in the spreadsheet. To start, grab the columns with names:</p>
1833 <ul>
1834 <li><code>product_id</code></li>
1835 <li><code>product_parent</code></li>
1836 <li><code>product_title</code></li>
1837 <li><code>product_category</code></li>
1838 </ul>
1839 <p>Lucky for you, the columns you need are all next to each other so you can use the <code>min_column</code> and <code>max_column</code> to easily get the data you want:</p>
1840 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
1841 <span class="gp">... </span>                             <span class="n">min_col</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
1842 <span class="gp">... </span>                             <span class="n">max_col</span><span class="o">=</span><span class="mi">7</span><span class="p">,</span>
1843 <span class="gp">... </span>                             <span class="n">values_only</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
1844 <span class="gp">... </span>    <span class="nb">print</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
1845 <span class="go">(&#39;B00FALQ1ZC&#39;, 937001370, &#39;Invicta Women\&#39;s 15150 &quot;Angel&quot; 18k Yellow...)</span>
1846 <span class="go">(&#39;B00D3RGO20&#39;, 484010722, &quot;Kenneth Cole New York Women&#39;s KC4944...)</span>
1847 <span class="gp">...</span>
1848 </pre></div>
1849 
1850 <p>Nice! Now that you know how to get all the important product information you need, let&rsquo;s put that data into a dictionary:</p>
1851 <div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">json</span>
1852 <span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">load_workbook</span>
1853 
1854 <span class="n">workbook</span> <span class="o">=</span> <span class="n">load_workbook</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;sample.xlsx&quot;</span><span class="p">)</span>
1855 <span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
1856 
1857 <span class="n">products</span> <span class="o">=</span> <span class="p">{}</span>
1858 
1859 <span class="c1"># Using the values_only because you want to return the cells&#39; values</span>
1860 <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
1861                            <span class="n">min_col</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
1862                            <span class="n">max_col</span><span class="o">=</span><span class="mi">7</span><span class="p">,</span>
1863                            <span class="n">values_only</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
1864     <span class="n">product_id</span> <span class="o">=</span> <span class="n">row</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
1865     <span class="n">product</span> <span class="o">=</span> <span class="p">{</span>
1866         <span class="s2">&quot;parent&quot;</span><span class="p">:</span> <span class="n">row</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span>
1867         <span class="s2">&quot;title&quot;</span><span class="p">:</span> <span class="n">row</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span>
1868         <span class="s2">&quot;category&quot;</span><span class="p">:</span> <span class="n">row</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span>
1869     <span class="p">}</span>
1870     <span class="n">products</span><span class="p">[</span><span class="n">product_id</span><span class="p">]</span> <span class="o">=</span> <span class="n">product</span>
1871 
1872 <span class="c1"># Using json here to be able to format the output for displaying later</span>
1873 <span class="nb">print</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">products</span><span class="p">))</span>
1874 </pre></div>
1875 
1876 <p>The code above returns a JSON similar to this:</p>
1877 <div class="highlight json"><pre><span></span><span class="p">{</span>
1878   <span class="nt">&quot;B00FALQ1ZC&quot;</span><span class="p">:</span> <span class="p">{</span>
1879     <span class="nt">&quot;parent&quot;</span><span class="p">:</span> <span class="mi">937001370</span><span class="p">,</span>
1880     <span class="nt">&quot;title&quot;</span><span class="p">:</span> <span class="s2">&quot;Invicta Women&#39;s 15150 ...&quot;</span><span class="p">,</span>
1881     <span class="nt">&quot;category&quot;</span><span class="p">:</span> <span class="s2">&quot;Watches&quot;</span>
1882   <span class="p">},</span>
1883   <span class="nt">&quot;B00D3RGO20&quot;</span><span class="p">:</span> <span class="p">{</span>
1884     <span class="nt">&quot;parent&quot;</span><span class="p">:</span> <span class="mi">484010722</span><span class="p">,</span>
1885     <span class="nt">&quot;title&quot;</span><span class="p">:</span> <span class="s2">&quot;Kenneth Cole New York ...&quot;</span><span class="p">,</span>
1886     <span class="nt">&quot;category&quot;</span><span class="p">:</span> <span class="s2">&quot;Watches&quot;</span>
1887   <span class="p">}</span>
1888 <span class="p">}</span>
1889 </pre></div>
1890 
1891 <p>Here you can see that the output is trimmed to 2 products only, but if you run the script as it is, then you should get 98 products.</p>
1892 <h4 id="convert-data-into-python-classes">Convert Data Into Python Classes</h4>
1893 <p>To finalize the reading section of this tutorial, let&rsquo;s dive into Python classes and see how you could improve on the example above and better structure the data.</p>
1894 <p>For this, you&rsquo;ll be using the new Python <a href="https://realpython.com/python-data-classes/">Data Classes</a> that are available from Python 3.7. If you&rsquo;re using an older version of Python, then you can use the default <a href="https://realpython.com/python3-object-oriented-programming/#classes-in-python">Classes</a> instead.</p>
1895 <p>So, first things first, let&rsquo;s look at the data you have and decide what you want to store and how you want to store it.</p>
1896 <p>As you saw right at the start, this data comes from Amazon, and it&rsquo;s a list of product reviews. You can check the <a href="https://s3.amazonaws.com/amazon-reviews-pds/tsv/index.txt">list of all the columns and their meaning</a> on Amazon.</p>
1897 <p>There are two significant elements you can extract from the data available:</p>
1898 <ol>
1899 <li>Products</li>
1900 <li>Reviews</li>
1901 </ol>
1902 <p>A <strong>Product</strong> has:</p>
1903 <ul>
1904 <li>ID</li>
1905 <li>Title</li>
1906 <li>Parent</li>
1907 <li>Category</li>
1908 </ul>
1909 <p>The <strong>Review</strong> has a few more fields:</p>
1910 <ul>
1911 <li>ID</li>
1912 <li>Customer ID</li>
1913 <li>Stars</li>
1914 <li>Headline</li>
1915 <li>Body</li>
1916 <li>Date</li>
1917 </ul>
1918 <p>You can ignore a few of the review fields to make things a bit simpler.</p>
1919 <p>So, a straightforward implementation of these two classes could be written in a separate file <code>classes.py</code>:</p>
1920 <div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">datetime</span>
1921 <span class="kn">from</span> <span class="nn">dataclasses</span> <span class="k">import</span> <span class="n">dataclass</span>
1922 
1923 <span class="nd">@dataclass</span>
1924 <span class="k">class</span> <span class="nc">Product</span><span class="p">:</span>
1925     <span class="nb">id</span><span class="p">:</span> <span class="nb">str</span>
1926     <span class="n">parent</span><span class="p">:</span> <span class="nb">str</span>
1927     <span class="n">title</span><span class="p">:</span> <span class="nb">str</span>
1928     <span class="n">category</span><span class="p">:</span> <span class="nb">str</span>
1929 
1930 <span class="nd">@dataclass</span>
1931 <span class="k">class</span> <span class="nc">Review</span><span class="p">:</span>
1932     <span class="nb">id</span><span class="p">:</span> <span class="nb">str</span>
1933     <span class="n">customer_id</span><span class="p">:</span> <span class="nb">str</span>
1934     <span class="n">stars</span><span class="p">:</span> <span class="nb">int</span>
1935     <span class="n">headline</span><span class="p">:</span> <span class="nb">str</span>
1936     <span class="n">body</span><span class="p">:</span> <span class="nb">str</span>
1937     <span class="n">date</span><span class="p">:</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span>
1938 </pre></div>
1939 
1940 <p>After defining your data classes, you need to convert the data from the spreadsheet into these new structures.</p>
1941 <p>Before doing the conversion, it&rsquo;s worth looking at our header again and creating a mapping between columns and the fields you need:</p>
1942 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
1943 <span class="gp">... </span>                             <span class="n">max_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
1944 <span class="gp">... </span>                             <span class="n">values_only</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
1945 <span class="gp">... </span>    <span class="nb">print</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
1946 <span class="go">(&#39;marketplace&#39;, &#39;customer_id&#39;, &#39;review_id&#39;, &#39;product_id&#39;, ...)</span>
1947 
1948 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Or an alternative</span>
1949 <span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">cell</span> <span class="ow">in</span> <span class="n">sheet</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span>
1950 <span class="gp">... </span>    <span class="nb">print</span><span class="p">(</span><span class="n">cell</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
1951 <span class="go">marketplace</span>
1952 <span class="go">customer_id</span>
1953 <span class="go">review_id</span>
1954 <span class="go">product_id</span>
1955 <span class="go">product_parent</span>
1956 <span class="gp">...</span>
1957 </pre></div>
1958 
1959 <p>Let&rsquo;s create a file <code>mapping.py</code> where you have a list of all the field names and their column location (zero-indexed) on the spreadsheet:</p>
1960 <div class="highlight python"><pre><span></span><span class="c1"># Product fields</span>
1961 <span class="n">PRODUCT_ID</span> <span class="o">=</span> <span class="mi">3</span>
1962 <span class="n">PRODUCT_PARENT</span> <span class="o">=</span> <span class="mi">4</span>
1963 <span class="n">PRODUCT_TITLE</span> <span class="o">=</span> <span class="mi">5</span>
1964 <span class="n">PRODUCT_CATEGORY</span> <span class="o">=</span> <span class="mi">6</span>
1965 
1966 <span class="c1"># Review fields</span>
1967 <span class="n">REVIEW_ID</span> <span class="o">=</span> <span class="mi">2</span>
1968 <span class="n">REVIEW_CUSTOMER</span> <span class="o">=</span> <span class="mi">1</span>
1969 <span class="n">REVIEW_STARS</span> <span class="o">=</span> <span class="mi">7</span>
1970 <span class="n">REVIEW_HEADLINE</span> <span class="o">=</span> <span class="mi">12</span>
1971 <span class="n">REVIEW_BODY</span> <span class="o">=</span> <span class="mi">13</span>
1972 <span class="n">REVIEW_DATE</span> <span class="o">=</span> <span class="mi">14</span>
1973 </pre></div>
1974 
1975 <p>You don&rsquo;t necessarily have to do the mapping above. It&rsquo;s more for readability when parsing the row data, so you don&rsquo;t end up with a lot of magic numbers lying around.</p>
1976 <p>Finally, let&rsquo;s look at the code needed to parse the spreadsheet data into a list of product and review objects:</p>
1977 <div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">datetime</span> <span class="k">import</span> <span class="n">datetime</span>
1978 <span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">load_workbook</span>
1979 <span class="kn">from</span> <span class="nn">classes</span> <span class="k">import</span> <span class="n">Product</span><span class="p">,</span> <span class="n">Review</span>
1980 <span class="kn">from</span> <span class="nn">mapping</span> <span class="k">import</span> <span class="n">PRODUCT_ID</span><span class="p">,</span> <span class="n">PRODUCT_PARENT</span><span class="p">,</span> <span class="n">PRODUCT_TITLE</span><span class="p">,</span> \
1981     <span class="n">PRODUCT_CATEGORY</span><span class="p">,</span> <span class="n">REVIEW_DATE</span><span class="p">,</span> <span class="n">REVIEW_ID</span><span class="p">,</span> <span class="n">REVIEW_CUSTOMER</span><span class="p">,</span> \
1982     <span class="n">REVIEW_STARS</span><span class="p">,</span> <span class="n">REVIEW_HEADLINE</span><span class="p">,</span> <span class="n">REVIEW_BODY</span>
1983 
1984 <span class="c1"># Using the read_only method since you&#39;re not gonna be editing the spreadsheet</span>
1985 <span class="n">workbook</span> <span class="o">=</span> <span class="n">load_workbook</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;sample.xlsx&quot;</span><span class="p">,</span> <span class="n">read_only</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
1986 <span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
1987 
1988 <span class="n">products</span> <span class="o">=</span> <span class="p">[]</span>
1989 <span class="n">reviews</span> <span class="o">=</span> <span class="p">[]</span>
1990 
1991 <span class="c1"># Using the values_only because you just want to return the cell value</span>
1992 <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">values_only</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
1993     <span class="n">product</span> <span class="o">=</span> <span class="n">Product</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">PRODUCT_ID</span><span class="p">],</span>
1994                       <span class="n">parent</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">PRODUCT_PARENT</span><span class="p">],</span>
1995                       <span class="n">title</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">PRODUCT_TITLE</span><span class="p">],</span>
1996                       <span class="n">category</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">PRODUCT_CATEGORY</span><span class="p">])</span>
1997     <span class="n">products</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">product</span><span class="p">)</span>
1998 
1999     <span class="c1"># You need to parse the date from the spreadsheet into a datetime format</span>
2000     <span class="n">spread_date</span> <span class="o">=</span> <span class="n">row</span><span class="p">[</span><span class="n">REVIEW_DATE</span><span class="p">]</span>
2001     <span class="n">parsed_date</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">strptime</span><span class="p">(</span><span class="n">spread_date</span><span class="p">,</span> <span class="s2">&quot;%Y-%m-</span><span class="si">%d</span><span class="s2">&quot;</span><span class="p">)</span>
2002 
2003     <span class="n">review</span> <span class="o">=</span> <span class="n">Review</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">REVIEW_ID</span><span class="p">],</span>
2004                     <span class="n">customer_id</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">REVIEW_CUSTOMER</span><span class="p">],</span>
2005                     <span class="n">stars</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">REVIEW_STARS</span><span class="p">],</span>
2006                     <span class="n">headline</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">REVIEW_HEADLINE</span><span class="p">],</span>
2007                     <span class="n">body</span><span class="o">=</span><span class="n">row</span><span class="p">[</span><span class="n">REVIEW_BODY</span><span class="p">],</span>
2008                     <span class="n">date</span><span class="o">=</span><span class="n">parsed_date</span><span class="p">)</span>
2009     <span class="n">reviews</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">review</span><span class="p">)</span>
2010 
2011 <span class="nb">print</span><span class="p">(</span><span class="n">products</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
2012 <span class="nb">print</span><span class="p">(</span><span class="n">reviews</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
2013 </pre></div>
2014 
2015 <p>After you run the code above, you should get some output like this:</p>
2016 <div class="highlight python"><pre><span></span><span class="n">Product</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="s1">&#39;B00FALQ1ZC&#39;</span><span class="p">,</span> <span class="n">parent</span><span class="o">=</span><span class="mi">937001370</span><span class="p">,</span> <span class="o">...</span><span class="p">)</span>
2017 <span class="n">Review</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="s1">&#39;R3O9SGZBVQBV76&#39;</span><span class="p">,</span> <span class="n">customer_id</span><span class="o">=</span><span class="mi">3653882</span><span class="p">,</span> <span class="o">...</span><span class="p">)</span>
2018 </pre></div>
2019 
2020 <p>That&rsquo;s it! Now you should have the data in a very simple and digestible class format, and you can start thinking of storing this in a <a href="https://realpython.com/tutorials/databases/">Database</a> or any other type of data storage you like.</p>
2021 <p>Using this kind of OOP strategy to parse spreadsheets makes handling the data much simpler later on.</p>
2022 <h3 id="appending-new-data">Appending New Data</h3>
2023 <p>Before you start creating very complex spreadsheets, have a quick look at an example of how to append data to an existing spreadsheet.</p>
2024 <p>Go back to the first example spreadsheet you created (<code>hello_world.xlsx</code>) and try opening it and appending some data to it, like this:</p>
2025 <div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">load_workbook</span>
2026 
2027 <span class="c1"># Start by opening the spreadsheet and selecting the main sheet</span>
2028 <span class="n">workbook</span> <span class="o">=</span> <span class="n">load_workbook</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;hello_world.xlsx&quot;</span><span class="p">)</span>
2029 <span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
2030 
2031 <span class="c1"># Write what you want into a specific cell</span>
2032 <span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;C1&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="s2">&quot;writing ;)&quot;</span>
2033 
2034 <span class="c1"># Save the spreadsheet</span>
2035 <span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;hello_world_append.xlsx&quot;</span>
2036 </pre></div>
2037 
2038 <p><em>Et voilΓ </em>, if you open the new <code>hello_world_append.xlsx</code> spreadsheet, you&rsquo;ll see the following change:</p>
2039 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_17.44.22.e4f18e5abc42.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_17.44.22.e4f18e5abc42.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_17.44.22.e4f18e5abc42.png&amp;w=540&amp;sig=098886279b90048004feb6dcdbe1c66ac3e231ce 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_17.44.22.e4f18e5abc42.png&amp;w=1080&amp;sig=8619e04c109779499f96dcd8aee01c4cf1ed52eb 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_17.44.22.e4f18e5abc42.png 2160w" sizes="75vw" alt="Appending Data to a Spreadsheet"/></a></p>
2040 <p>Notice the additional <em>writing ;)</em> on cell <code>C1</code>.</p>
2041 <h2 id="writing-excel-spreadsheets-with-openpyxl">Writing Excel Spreadsheets With openpyxl</h2>
2042 <p>There are a lot of different things you can write to a spreadsheet, from simple text or number values to complex formulas, charts, or even images.</p>
2043 <p>Let&rsquo;s start creating some spreadsheets!</p>
2044 <h3 id="creating-a-simple-spreadsheet">Creating a Simple Spreadsheet</h3>
2045 <p>Previously, you saw a very quick example of how to write &ldquo;Hello world!&rdquo; into a spreadsheet, so you can start with that:</p>
2046 <div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">Workbook</span>
2047 <span class="lineno"> 2 </span>
2048 <span class="lineno"> 3 </span><span class="n">filename</span> <span class="o">=</span> <span class="s2">&quot;hello_world.xlsx&quot;</span>
2049 <span class="lineno"> 4 </span>
2050 <span class="lineno"> 5 </span><span class="hll"><span class="n">workbook</span> <span class="o">=</span> <span class="n">Workbook</span><span class="p">()</span>
2051 </span><span class="lineno"> 6 </span><span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
2052 <span class="lineno"> 7 </span>
2053 <span class="lineno"> 8 </span><span class="hll"><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;A1&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="s2">&quot;hello&quot;</span>
2054 </span><span class="lineno"> 9 </span><span class="hll"><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;B1&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="s2">&quot;world!&quot;</span>
2055 </span><span class="lineno">10 </span>
2056 <span class="lineno">11 </span><span class="hll"><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="n">filename</span><span class="p">)</span>
2057 </span></pre></div>
2058 
2059 <p>The highlighted lines in the code above are the most important ones for writing. In the code, you can see that:</p>
2060 <ul>
2061 <li><strong>Line 5</strong> shows you how to create a new empty workbook.</li>
2062 <li><strong>Lines 8 and 9</strong> show you how to add data to specific cells.</li>
2063 <li><strong>Line 11</strong> shows you how to save the spreadsheet when you&rsquo;re done.</li>
2064 </ul>
2065 <p>Even though these lines above can be straightforward, it&rsquo;s still good to know them well for when things get a bit more complicated.</p>
2066 <div class="alert alert-primary" role="alert">
2067 <p><strong>Note:</strong> You&rsquo;ll be using the <code>hello_world.xlsx</code> spreadsheet for some of the upcoming examples, so keep it handy.</p>
2068 </div>
2069 <p>One thing you can do to help with coming code examples is add the following method to your Python file or console:</p>
2070 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">def</span> <span class="nf">print_rows</span><span class="p">():</span>
2071 <span class="gp">... </span>    <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">values_only</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
2072 <span class="gp">... </span>        <span class="nb">print</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
2073 </pre></div>
2074 
2075 <p>It makes it easier to print all of your spreadsheet values by just calling <code>print_rows()</code>.</p>
2076 <h3 id="basic-spreadsheet-operations">Basic Spreadsheet Operations</h3>
2077 <p>Before you get into the more advanced topics, it&rsquo;s good for you to know how to manage the most simple elements of a spreadsheet.</p>
2078 <h4 id="adding-and-updating-cell-values">Adding and Updating Cell Values</h4>
2079 <p>You already learned how to add values to a spreadsheet like this:</p>
2080 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;A1&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="s2">&quot;value&quot;</span>
2081 </pre></div>
2082 
2083 <p>There&rsquo;s another way you can do this, by first selecting a cell and then changing its value:</p>
2084 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">cell</span> <span class="o">=</span> <span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;A1&quot;</span><span class="p">]</span>
2085 <span class="gp">&gt;&gt;&gt; </span><span class="n">cell</span>
2086 <span class="go">&lt;Cell &#39;Sheet&#39;.A1&gt;</span>
2087 
2088 <span class="gp">&gt;&gt;&gt; </span><span class="n">cell</span><span class="o">.</span><span class="n">value</span>
2089 <span class="go">&#39;hello&#39;</span>
2090 
2091 <span class="gp">&gt;&gt;&gt; </span><span class="n">cell</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="s2">&quot;hey&quot;</span>
2092 <span class="gp">&gt;&gt;&gt; </span><span class="n">cell</span><span class="o">.</span><span class="n">value</span>
2093 <span class="go">&#39;hey&#39;</span>
2094 </pre></div>
2095 
2096 <p>The new value is only stored into the spreadsheet once you call <code>workbook.save()</code>.</p>
2097 <p>The <code>openpyxl</code> creates a cell when adding a value, if that cell didn&rsquo;t exist before:</p>
2098 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="c1"># Before, our spreadsheet has only 1 row</span>
2099 <span class="gp">&gt;&gt;&gt; </span><span class="n">print_rows</span><span class="p">()</span>
2100 <span class="go">(&#39;hello&#39;, &#39;world!&#39;)</span>
2101 
2102 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Try adding a value to row 10</span>
2103 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;B10&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="s2">&quot;test&quot;</span>
2104 <span class="gp">&gt;&gt;&gt; </span><span class="n">print_rows</span><span class="p">()</span>
2105 <span class="go">(&#39;hello&#39;, &#39;world!&#39;)</span>
2106 <span class="go">(None, None)</span>
2107 <span class="go">(None, None)</span>
2108 <span class="go">(None, None)</span>
2109 <span class="go">(None, None)</span>
2110 <span class="go">(None, None)</span>
2111 <span class="go">(None, None)</span>
2112 <span class="go">(None, None)</span>
2113 <span class="go">(None, None)</span>
2114 <span class="go">(None, &#39;test&#39;)</span>
2115 </pre></div>
2116 
2117 <p>As you can see, when trying to add a value to cell <code>B10</code>, you end up with a tuple with 10 rows, just so you can have that <em>test</em> value.</p>
2118 <h4 id="managing-rows-and-columns">Managing Rows and Columns</h4>
2119 <p>One of the most common things you have to do when manipulating spreadsheets is adding or removing rows and columns. The <code>openpyxl</code> package allows you to do that in a very straightforward way by using the methods:</p>
2120 <ul>
2121 <li><code>.insert_rows()</code></li>
2122 <li><code>.delete_rows()</code></li>
2123 <li><code>.insert_cols()</code></li>
2124 <li><code>.delete_cols()</code></li>
2125 </ul>
2126 <p>Every single one of those methods can receive two arguments:</p>
2127 <ol>
2128 <li><code>idx</code></li>
2129 <li><code>amount</code></li>
2130 </ol>
2131 <p>Using our basic <code>hello_world.xlsx</code> example again, let&rsquo;s see how these methods work:</p>
2132 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">print_rows</span><span class="p">()</span>
2133 <span class="go">(&#39;hello&#39;, &#39;world!&#39;)</span>
2134 
2135 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Insert a column before the existing column 1 (&quot;A&quot;)</span>
2136 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">insert_cols</span><span class="p">(</span><span class="n">idx</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
2137 <span class="gp">&gt;&gt;&gt; </span><span class="n">print_rows</span><span class="p">()</span>
2138 <span class="go">(None, &#39;hello&#39;, &#39;world!&#39;)</span>
2139 
2140 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Insert 5 columns between column 2 (&quot;B&quot;) and 3 (&quot;C&quot;)</span>
2141 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">insert_cols</span><span class="p">(</span><span class="n">idx</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">amount</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
2142 <span class="gp">&gt;&gt;&gt; </span><span class="n">print_rows</span><span class="p">()</span>
2143 <span class="go">(None, &#39;hello&#39;, None, None, None, None, None, &#39;world!&#39;)</span>
2144 
2145 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Delete the created columns</span>
2146 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">delete_cols</span><span class="p">(</span><span class="n">idx</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">amount</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
2147 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">delete_cols</span><span class="p">(</span><span class="n">idx</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
2148 <span class="gp">&gt;&gt;&gt; </span><span class="n">print_rows</span><span class="p">()</span>
2149 <span class="go">(&#39;hello&#39;, &#39;world!&#39;)</span>
2150 
2151 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Insert a new row in the beginning</span>
2152 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">insert_rows</span><span class="p">(</span><span class="n">idx</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
2153 <span class="gp">&gt;&gt;&gt; </span><span class="n">print_rows</span><span class="p">()</span>
2154 <span class="go">(None, None)</span>
2155 <span class="go">(&#39;hello&#39;, &#39;world!&#39;)</span>
2156 
2157 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Insert 3 new rows in the beginning</span>
2158 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">insert_rows</span><span class="p">(</span><span class="n">idx</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">amount</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
2159 <span class="gp">&gt;&gt;&gt; </span><span class="n">print_rows</span><span class="p">()</span>
2160 <span class="go">(None, None)</span>
2161 <span class="go">(None, None)</span>
2162 <span class="go">(None, None)</span>
2163 <span class="go">(None, None)</span>
2164 <span class="go">(&#39;hello&#39;, &#39;world!&#39;)</span>
2165 
2166 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Delete the first 4 rows</span>
2167 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">delete_rows</span><span class="p">(</span><span class="n">idx</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">amount</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
2168 <span class="gp">&gt;&gt;&gt; </span><span class="n">print_rows</span><span class="p">()</span>
2169 <span class="go">(&#39;hello&#39;, &#39;world!&#39;)</span>
2170 </pre></div>
2171 
2172 <p>The only thing you need to remember is that when inserting new data (rows or columns), the insertion happens <strong>before</strong> the <code>idx</code> parameter.</p>
2173 <p>So, if you do <code>insert_rows(1)</code>, it inserts a new row <strong>before</strong> the existing first row.</p>
2174 <p>It&rsquo;s the same for columns: when you call <code>insert_cols(2)</code>, it inserts a new column right <strong>before</strong> the already existing second column (<code>B</code>).</p>
2175 <p>However, when deleting rows or columns, <code>.delete_...</code> deletes data <strong>starting from</strong> the index passed as an argument.</p>
2176 <p>For example, when doing <code>delete_rows(2)</code> it deletes row <code>2</code>, and when doing <code>delete_cols(3)</code> it deletes the third column (<code>C</code>).</p>
2177 <h4 id="managing-sheets">Managing Sheets</h4>
2178 <p>Sheet management is also one of those things you might need to know, even though it might be something that you don&rsquo;t use that often.</p>
2179 <p>If you look back at the code examples from this tutorial, you&rsquo;ll notice the following recurring piece of code:</p>
2180 <div class="highlight python"><pre><span></span><span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
2181 </pre></div>
2182 
2183 <p>This is the way to select the default sheet from a spreadsheet. However, if you&rsquo;re opening a spreadsheet with multiple sheets, then you can always select a specific one like this:</p>
2184 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="c1"># Let&#39;s say you have two sheets: &quot;Products&quot; and &quot;Company Sales&quot;</span>
2185 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
2186 <span class="go">[&#39;Products&#39;, &#39;Company Sales&#39;]</span>
2187 
2188 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># You can select a sheet using its title</span>
2189 <span class="gp">&gt;&gt;&gt; </span><span class="n">products_sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="p">[</span><span class="s2">&quot;Products&quot;</span><span class="p">]</span>
2190 <span class="gp">&gt;&gt;&gt; </span><span class="n">sales_sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="p">[</span><span class="s2">&quot;Company Sales&quot;</span><span class="p">]</span>
2191 </pre></div>
2192 
2193 <p>You can also change a sheet title very easily:</p>
2194 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
2195 <span class="go">[&#39;Products&#39;, &#39;Company Sales&#39;]</span>
2196 
2197 <span class="gp">&gt;&gt;&gt; </span><span class="n">products_sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="p">[</span><span class="s2">&quot;Products&quot;</span><span class="p">]</span>
2198 <span class="gp">&gt;&gt;&gt; </span><span class="n">products_sheet</span><span class="o">.</span><span class="n">title</span> <span class="o">=</span> <span class="s2">&quot;New Products&quot;</span>
2199 
2200 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
2201 <span class="go">[&#39;New Products&#39;, &#39;Company Sales&#39;]</span>
2202 </pre></div>
2203 
2204 <p>If you want to create or delete sheets, then you can also do that with <code>.create_sheet()</code> and <code>.remove()</code>:</p>
2205 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
2206 <span class="go">[&#39;Products&#39;, &#39;Company Sales&#39;]</span>
2207 
2208 <span class="gp">&gt;&gt;&gt; </span><span class="n">operations_sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">create_sheet</span><span class="p">(</span><span class="s2">&quot;Operations&quot;</span><span class="p">)</span>
2209 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
2210 <span class="go">[&#39;Products&#39;, &#39;Company Sales&#39;, &#39;Operations&#39;]</span>
2211 
2212 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># You can also define the position to create the sheet at</span>
2213 <span class="gp">&gt;&gt;&gt; </span><span class="n">hr_sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">create_sheet</span><span class="p">(</span><span class="s2">&quot;HR&quot;</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
2214 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
2215 <span class="go">[&#39;HR&#39;, &#39;Products&#39;, &#39;Company Sales&#39;, &#39;Operations&#39;]</span>
2216 
2217 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># To remove them, just pass the sheet as an argument to the .remove()</span>
2218 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">operations_sheet</span><span class="p">)</span>
2219 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
2220 <span class="go">[&#39;HR&#39;, &#39;Products&#39;, &#39;Company Sales&#39;]</span>
2221 
2222 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">hr_sheet</span><span class="p">)</span>
2223 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
2224 <span class="go">[&#39;Products&#39;, &#39;Company Sales&#39;]</span>
2225 </pre></div>
2226 
2227 <p>One other thing you can do is make duplicates of a sheet using <code>copy_worksheet()</code>:</p>
2228 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
2229 <span class="go">[&#39;Products&#39;, &#39;Company Sales&#39;]</span>
2230 
2231 <span class="gp">&gt;&gt;&gt; </span><span class="n">products_sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="p">[</span><span class="s2">&quot;Products&quot;</span><span class="p">]</span>
2232 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">copy_worksheet</span><span class="p">(</span><span class="n">products_sheet</span><span class="p">)</span>
2233 <span class="go">&lt;Worksheet &quot;Products Copy&quot;&gt;</span>
2234 
2235 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">sheetnames</span>
2236 <span class="go">[&#39;Products&#39;, &#39;Company Sales&#39;, &#39;Products Copy&#39;]</span>
2237 </pre></div>
2238 
2239 <p>If you open your spreadsheet after saving the above code, you&rsquo;ll notice that the sheet <em>Products Copy</em> is a duplicate of the sheet <em>Products</em>.</p>
2240 <h4 id="freezing-rows-and-columns">Freezing Rows and Columns</h4>
2241 <p>Something that you might want to do when working with big spreadsheets is to freeze a few rows or columns, so they remain visible when you scroll right or down.</p>
2242 <p>Freezing data allows you to keep an eye on important rows or columns, regardless of where you scroll in the spreadsheet.</p>
2243 <p>Again, <code>openpyxl</code> also has a way to accomplish this by using the worksheet <code>freeze_panes</code> attribute. For this example, go back to our <code>sample.xlsx</code> spreadsheet and try doing the following:</p>
2244 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span> <span class="o">=</span> <span class="n">load_workbook</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;sample.xlsx&quot;</span><span class="p">)</span>
2245 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
2246 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">freeze_panes</span> <span class="o">=</span> <span class="s2">&quot;C2&quot;</span>
2247 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">&quot;sample_frozen.xlsx&quot;</span><span class="p">)</span>
2248 </pre></div>
2249 
2250 <p>If you open the <code>sample_frozen.xlsx</code> spreadsheet in your favorite spreadsheet editor, you&rsquo;ll notice that row <code>1</code> and columns <code>A</code> and <code>B</code> are frozen and are always visible no matter where you navigate within the spreadsheet.</p>
2251 <p>This feature is handy, for example, to keep headers within sight, so you always know what each column represents.</p>
2252 <p>Here&rsquo;s how it looks in the editor:</p>
2253 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_18.12.20.55694a0781f8.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_18.12.20.55694a0781f8.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.12.20.55694a0781f8.png&amp;w=540&amp;sig=5826de23e5df2e08d625844698fc3a29b32ee7b2 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.12.20.55694a0781f8.png&amp;w=1080&amp;sig=c3abe2321f00372d975bbbf033f3ebe3687eb09f 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_18.12.20.55694a0781f8.png 2160w" sizes="75vw" alt="Example Spreadsheet With Frozen Rows and Columns"/></a></p>
2254 <p>Notice how you&rsquo;re at the end of the spreadsheet, and yet, you can see both row <code>1</code> and columns <code>A</code> and <code>B</code>.</p>
2255 <h4 id="adding-filters">Adding Filters</h4>
2256 <p>You can use <code>openpyxl</code> to add filters and sorts to your spreadsheet. However, when you open the spreadsheet, the data won&rsquo;t be rearranged according to these sorts and filters.</p>
2257 <p>At first, this might seem like a pretty useless feature, but when you&rsquo;re programmatically creating a spreadsheet that is going to be sent and used by somebody else, it&rsquo;s still nice to at least create the filters and allow people to use it afterward.</p>
2258 <p>The code below is an example of how you would add some filters to our existing <code>sample.xlsx</code> spreadsheet:</p>
2259 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="c1"># Check the used spreadsheet space using the attribute &quot;dimensions&quot;</span>
2260 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">dimensions</span>
2261 <span class="go">&#39;A1:O100&#39;</span>
2262 
2263 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">auto_filter</span><span class="o">.</span><span class="n">ref</span> <span class="o">=</span> <span class="s2">&quot;A1:O100&quot;</span>
2264 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;sample_with_filters.xlsx&quot;</span><span class="p">)</span>
2265 </pre></div>
2266 
2267 <p>You should now see the filters created when opening the spreadsheet in your editor:</p>
2268 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_18.20.35.5fdbfe805194.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_18.20.35.5fdbfe805194.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.20.35.5fdbfe805194.png&amp;w=540&amp;sig=c1d7ad4f2dfc03fc8730e3babf9000ac74170c7d 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.20.35.5fdbfe805194.png&amp;w=1080&amp;sig=27e888a967ddc112f1e824be671a15e2c111fe6c 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_18.20.35.5fdbfe805194.png 2160w" sizes="75vw" alt="Example Spreadsheet With Filters"/></a></p>
2269 <p>You don&rsquo;t have to use <code>sheet.dimensions</code> if you know precisely which part of the spreadsheet you want to apply filters to.</p>
2270 <h3 id="adding-formulas">Adding Formulas</h3>
2271 <p><strong>Formulas</strong> (or <strong>formulae</strong>) are one of the most powerful features of spreadsheets.</p>
2272 <p>They gives you the power to apply specific mathematical equations to a range of cells. Using formulas with <code>openpyxl</code> is as simple as editing the value of a cell.</p>
2273 <p>You can see the list of formulas supported by <code>openpyxl</code>:</p>
2274 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">openpyxl.utils</span> <span class="k">import</span> <span class="n">FORMULAE</span>
2275 <span class="gp">&gt;&gt;&gt; </span><span class="n">FORMULAE</span>
2276 <span class="go">frozenset({&#39;ABS&#39;,</span>
2277 <span class="go">           &#39;ACCRINT&#39;,</span>
2278 <span class="go">           &#39;ACCRINTM&#39;,</span>
2279 <span class="go">           &#39;ACOS&#39;,</span>
2280 <span class="go">           &#39;ACOSH&#39;,</span>
2281 <span class="go">           &#39;AMORDEGRC&#39;,</span>
2282 <span class="go">           &#39;AMORLINC&#39;,</span>
2283 <span class="go">           &#39;AND&#39;,</span>
2284 <span class="go">           ...</span>
2285 <span class="go">           &#39;YEARFRAC&#39;,</span>
2286 <span class="go">           &#39;YIELD&#39;,</span>
2287 <span class="go">           &#39;YIELDDISC&#39;,</span>
2288 <span class="go">           &#39;YIELDMAT&#39;,</span>
2289 <span class="go">           &#39;ZTEST&#39;})</span>
2290 </pre></div>
2291 
2292 <p>Let&rsquo;s add some formulas to our <code>sample.xlsx</code> spreadsheet.</p>
2293 <p>Starting with something easy, let&rsquo;s check the average star rating for the 99 reviews within the spreadsheet:</p>
2294 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="c1"># Star rating is column &quot;H&quot;</span>
2295 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;P2&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="s2">&quot;=AVERAGE(H2:H100)&quot;</span>
2296 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;sample_formulas.xlsx&quot;</span><span class="p">)</span>
2297 </pre></div>
2298 
2299 <p>If you open the spreadsheet now and go to cell <code>P2</code>, you should see that its value is: <em>4.18181818181818</em>. Have a look in the editor:</p>
2300 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_18.33.09.7c2633f706cc.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_18.33.09.7c2633f706cc.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.33.09.7c2633f706cc.png&amp;w=540&amp;sig=5d7a9eb97acf524d5d2b9b93ae0e9214bbcf95c8 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.33.09.7c2633f706cc.png&amp;w=1080&amp;sig=8af67321cb101fb2f30fd0ca2bdcc62de35c9334 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_18.33.09.7c2633f706cc.png 2160w" sizes="75vw" alt="Example Spreadsheet With Average Formula"/></a></p>
2301 <p>You can use the same methodology to add any formulas to your spreadsheet. For example, let&rsquo;s count the number of reviews that had helpful votes:</p>
2302 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="c1"># The helpful votes are counted on column &quot;I&quot;</span>
2303 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;P3&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;=COUNTIF(I2:I100, &quot;&gt;0&quot;)&#39;</span>
2304 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;sample_formulas.xlsx&quot;</span><span class="p">)</span>
2305 </pre></div>
2306 
2307 <p>You should get the number <code>21</code> on your <code>P3</code> spreadsheet cell like so:</p>
2308 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_18.35.24.e26e97b0c9c0.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_18.35.24.e26e97b0c9c0.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.35.24.e26e97b0c9c0.png&amp;w=540&amp;sig=0ec4c4c12a792a1a393e0273855282bfa0594d53 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.35.24.e26e97b0c9c0.png&amp;w=1080&amp;sig=67b399b8cb79ddbe7285da0325fe8f6b9edf3ecc 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_18.35.24.e26e97b0c9c0.png 2160w" sizes="75vw" alt="Example Spreadsheet With Average and CountIf Formula"/></a></p>
2309 <p>You&rsquo;ll have to make sure that the strings within a formula are always in double quotes, so you either have to use single quotes around the formula like in the example above or you&rsquo;ll have to escape the double quotes inside the formula: <code>"=COUNTIF(I2:I100, \"&gt;0\")"</code>.</p>
2310 <p>There are a ton of other formulas you can add to your spreadsheet using the same procedure you tried above. Give it a go yourself!</p>
2311 <h3 id="adding-styles">Adding Styles</h3>
2312 <p>Even though styling a spreadsheet might not be something you would do every day, it&rsquo;s still good to know how to do it.</p>
2313 <p>Using <code>openpyxl</code>, you can apply multiple styling options to your spreadsheet, including fonts, borders, colors, and so on. Have a look at the <code>openpyxl</code> <a href="https://openpyxl.readthedocs.io/en/stable/styles.html">documentation</a> to learn more.</p>
2314 <p>You can also choose to either apply a style directly to a cell or create a template and reuse it to apply styles to multiple cells.</p>
2315 <p>Let&rsquo;s start by having a look at simple cell styling, using our <code>sample.xlsx</code> again as the base spreadsheet:</p>
2316 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="c1"># Import necessary style classes</span>
2317 <span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">openpyxl.styles</span> <span class="k">import</span> <span class="n">Font</span><span class="p">,</span> <span class="n">Color</span><span class="p">,</span> <span class="n">Alignment</span><span class="p">,</span> <span class="n">Border</span><span class="p">,</span> <span class="n">Side</span><span class="p">,</span> <span class="n">colors</span>
2318 
2319 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Create a few styles</span>
2320 <span class="gp">&gt;&gt;&gt; </span><span class="n">bold_font</span> <span class="o">=</span> <span class="n">Font</span><span class="p">(</span><span class="n">bold</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
2321 <span class="gp">&gt;&gt;&gt; </span><span class="n">big_red_text</span> <span class="o">=</span> <span class="n">Font</span><span class="p">(</span><span class="n">color</span><span class="o">=</span><span class="n">colors</span><span class="o">.</span><span class="n">RED</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">20</span><span class="p">)</span>
2322 <span class="gp">&gt;&gt;&gt; </span><span class="n">center_aligned_text</span> <span class="o">=</span> <span class="n">Alignment</span><span class="p">(</span><span class="n">horizontal</span><span class="o">=</span><span class="s2">&quot;center&quot;</span><span class="p">)</span>
2323 <span class="gp">&gt;&gt;&gt; </span><span class="n">double_border_side</span> <span class="o">=</span> <span class="n">Side</span><span class="p">(</span><span class="n">border_style</span><span class="o">=</span><span class="s2">&quot;double&quot;</span><span class="p">)</span>
2324 <span class="gp">&gt;&gt;&gt; </span><span class="n">square_border</span> <span class="o">=</span> <span class="n">Border</span><span class="p">(</span><span class="n">top</span><span class="o">=</span><span class="n">double_border_side</span><span class="p">,</span>
2325 <span class="gp">... </span>                       <span class="n">right</span><span class="o">=</span><span class="n">double_border_side</span><span class="p">,</span>
2326 <span class="gp">... </span>                       <span class="n">bottom</span><span class="o">=</span><span class="n">double_border_side</span><span class="p">,</span>
2327 <span class="gp">... </span>                       <span class="n">left</span><span class="o">=</span><span class="n">double_border_side</span><span class="p">)</span>
2328 
2329 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Style some cells!</span>
2330 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;A2&quot;</span><span class="p">]</span><span class="o">.</span><span class="n">font</span> <span class="o">=</span> <span class="n">bold_font</span>
2331 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;A3&quot;</span><span class="p">]</span><span class="o">.</span><span class="n">font</span> <span class="o">=</span> <span class="n">big_red_text</span>
2332 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;A4&quot;</span><span class="p">]</span><span class="o">.</span><span class="n">alignment</span> <span class="o">=</span> <span class="n">center_aligned_text</span>
2333 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;A5&quot;</span><span class="p">]</span><span class="o">.</span><span class="n">border</span> <span class="o">=</span> <span class="n">square_border</span>
2334 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;sample_styles.xlsx&quot;</span><span class="p">)</span>
2335 </pre></div>
2336 
2337 <p>If you open your spreadsheet now, you should see quite a few different styles on the first 5 cells of column <code>A</code>:</p>
2338 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_18.43.15.e3aeb3fb06e3.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_18.43.15.e3aeb3fb06e3.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.43.15.e3aeb3fb06e3.png&amp;w=540&amp;sig=ecc21878006697a6135ae515442642a95ab2bfb6 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.43.15.e3aeb3fb06e3.png&amp;w=1080&amp;sig=6f0ef4a148f1ca5a588e0cb2c02b0c9aad4246f2 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_18.43.15.e3aeb3fb06e3.png 2160w" sizes="75vw" alt="Example Spreadsheet With Simple Cell Styles"/></a></p>
2339 <p>There you go. You got:</p>
2340 <ul>
2341 <li><strong>A2</strong> with the text in bold</li>
2342 <li><strong>A3</strong> with the text in red and bigger font size</li>
2343 <li><strong>A4</strong> with the text centered</li>
2344 <li><strong>A5</strong> with a square border around the text</li>
2345 </ul>
2346 <div class="alert alert-primary" role="alert">
2347 <p><strong>Note:</strong> For the colors, you can also use HEX codes instead by doing  <code>Font(color="C70E0F")</code>.</p>
2348 </div>
2349 <p>You can also combine styles by simply adding them to the cell at the same time:</p>
2350 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="c1"># Reusing the same styles from the example above</span>
2351 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;A6&quot;</span><span class="p">]</span><span class="o">.</span><span class="n">alignment</span> <span class="o">=</span> <span class="n">center_aligned_text</span>
2352 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;A6&quot;</span><span class="p">]</span><span class="o">.</span><span class="n">font</span> <span class="o">=</span> <span class="n">big_red_text</span>
2353 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="p">[</span><span class="s2">&quot;A6&quot;</span><span class="p">]</span><span class="o">.</span><span class="n">border</span> <span class="o">=</span> <span class="n">square_border</span>
2354 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;sample_styles.xlsx&quot;</span><span class="p">)</span>
2355 </pre></div>
2356 
2357 <p>Have a look at cell <code>A6</code> here:</p>
2358 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_18.46.04.314517930065.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_18.46.04.314517930065.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.46.04.314517930065.png&amp;w=540&amp;sig=290bbf523eb24ac8c9741daf86701ca57cad4b96 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.46.04.314517930065.png&amp;w=1080&amp;sig=9decedef154c2138e26b287f2186213142650f6e 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_18.46.04.314517930065.png 2160w" sizes="75vw" alt="Example Spreadsheet With Coupled Cell Styles"/></a></p>
2359 <p>When you want to apply multiple styles to one or several cells, you can use a <code>NamedStyle</code> class instead, which is like a style template that you can use over and over again. Have a look at the example below:</p>
2360 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">openpyxl.styles</span> <span class="k">import</span> <span class="n">NamedStyle</span>
2361 
2362 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Let&#39;s create a style template for the header row</span>
2363 <span class="gp">&gt;&gt;&gt; </span><span class="n">header</span> <span class="o">=</span> <span class="n">NamedStyle</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">&quot;header&quot;</span><span class="p">)</span>
2364 <span class="gp">&gt;&gt;&gt; </span><span class="n">header</span><span class="o">.</span><span class="n">font</span> <span class="o">=</span> <span class="n">Font</span><span class="p">(</span><span class="n">bold</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
2365 <span class="gp">&gt;&gt;&gt; </span><span class="n">header</span><span class="o">.</span><span class="n">border</span> <span class="o">=</span> <span class="n">Border</span><span class="p">(</span><span class="n">bottom</span><span class="o">=</span><span class="n">Side</span><span class="p">(</span><span class="n">border_style</span><span class="o">=</span><span class="s2">&quot;thin&quot;</span><span class="p">))</span>
2366 <span class="gp">&gt;&gt;&gt; </span><span class="n">header</span><span class="o">.</span><span class="n">alignment</span> <span class="o">=</span> <span class="n">Alignment</span><span class="p">(</span><span class="n">horizontal</span><span class="o">=</span><span class="s2">&quot;center&quot;</span><span class="p">,</span> <span class="n">vertical</span><span class="o">=</span><span class="s2">&quot;center&quot;</span><span class="p">)</span>
2367 
2368 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Now let&#39;s apply this to all first row (header) cells</span>
2369 <span class="gp">&gt;&gt;&gt; </span><span class="n">header_row</span> <span class="o">=</span> <span class="n">sheet</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
2370 <span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">cell</span> <span class="ow">in</span> <span class="n">header_row</span><span class="p">:</span>
2371 <span class="gp">... </span>    <span class="n">cell</span><span class="o">.</span><span class="n">style</span> <span class="o">=</span> <span class="n">header</span>
2372 
2373 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;sample_styles.xlsx&quot;</span><span class="p">)</span>
2374 </pre></div>
2375 
2376 <p>If you open the spreadsheet now, you should see that its first row is bold, the text is aligned to the center, and there&rsquo;s a small bottom border! Have a look below:</p>
2377 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_18.48.33.4bc57d1b24d5.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_18.48.33.4bc57d1b24d5.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.48.33.4bc57d1b24d5.png&amp;w=540&amp;sig=199107a0c9ea60fbf1dfcc078a7680b43faeef3a 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.48.33.4bc57d1b24d5.png&amp;w=1080&amp;sig=af3e5225e36a24dea088e8c175da02050bb5dda9 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_18.48.33.4bc57d1b24d5.png 2160w" sizes="75vw" alt="Example Spreadsheet With Named Styles"/></a></p>
2378 <p>As you saw above, there are many options when it comes to styling, and it depends on the use case, so feel free to check <code>openpyxl</code> <a href="https://openpyxl.readthedocs.io/en/stable/styles.html">documentation</a> and see what other things you can do.</p>
2379 <h3 id="conditional-formatting">Conditional Formatting</h3>
2380 <p>This feature is one of my personal favorites when it comes to adding styles to a spreadsheet.</p>
2381 <p>It&rsquo;s a much more powerful approach to styling because it dynamically applies styles according to how the data in the spreadsheet changes.</p>
2382 <p>In a nutshell, <strong>conditional formatting</strong> allows you to specify a list of styles to apply to a cell (or cell range) according to specific conditions.</p>
2383 <p>For example, a widespread use case is to have a balance sheet where all the negative totals are in red, and the positive ones are in green. This formatting makes it much more efficient to spot good vs bad periods.</p>
2384 <p>Without further ado, let&rsquo;s pick our favorite spreadsheet&mdash;<code>sample.xlsx</code>&mdash;and add some conditional formatting.</p>
2385 <p>You can start by adding a simple one that adds a red background to all reviews with less than 3 stars:</p>
2386 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">openpyxl.styles</span> <span class="k">import</span> <span class="n">PatternFill</span><span class="p">,</span> <span class="n">colors</span>
2387 <span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">openpyxl.styles.differential</span> <span class="k">import</span> <span class="n">DifferentialStyle</span>
2388 <span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">openpyxl.formatting.rule</span> <span class="k">import</span> <span class="n">Rule</span>
2389 
2390 <span class="gp">&gt;&gt;&gt; </span><span class="n">red_background</span> <span class="o">=</span> <span class="n">PatternFill</span><span class="p">(</span><span class="n">bgColor</span><span class="o">=</span><span class="n">colors</span><span class="o">.</span><span class="n">RED</span><span class="p">)</span>
2391 <span class="gp">&gt;&gt;&gt; </span><span class="n">diff_style</span> <span class="o">=</span> <span class="n">DifferentialStyle</span><span class="p">(</span><span class="n">fill</span><span class="o">=</span><span class="n">red_background</span><span class="p">)</span>
2392 <span class="gp">&gt;&gt;&gt; </span><span class="n">rule</span> <span class="o">=</span> <span class="n">Rule</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s2">&quot;expression&quot;</span><span class="p">,</span> <span class="n">dxf</span><span class="o">=</span><span class="n">diff_style</span><span class="p">)</span>
2393 <span class="gp">&gt;&gt;&gt; </span><span class="n">rule</span><span class="o">.</span><span class="n">formula</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&quot;$H1&lt;3&quot;</span><span class="p">]</span>
2394 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">conditional_formatting</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s2">&quot;A1:O100&quot;</span><span class="p">,</span> <span class="n">rule</span><span class="p">)</span>
2395 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">&quot;sample_conditional_formatting.xlsx&quot;</span><span class="p">)</span>
2396 </pre></div>
2397 
2398 <p>Now you&rsquo;ll see all the reviews with a star rating below 3 marked with a red background:</p>
2399 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_18.55.41.17f234a186c6.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_18.55.41.17f234a186c6.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.55.41.17f234a186c6.png&amp;w=540&amp;sig=f3c141c2fe708c031c32c083cb038a736fd8da87 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_18.55.41.17f234a186c6.png&amp;w=1080&amp;sig=9ded3045cee09d34cd6f5dada7a4eea669ac4808 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_18.55.41.17f234a186c6.png 2160w" sizes="75vw" alt="Example Spreadsheet With Simple Conditional Formatting"/></a></p>
2400 <p>Code-wise, the only things that are new here are the objects <code>DifferentialStyle</code> and <code>Rule</code>:</p>
2401 <ul>
2402 <li><strong><code>DifferentialStyle</code></strong> is quite similar to <code>NamedStyle</code>, which you already saw above, and it&rsquo;s used to aggregate multiple styles such as fonts, borders, alignment, and so forth.</li>
2403 <li><strong><code>Rule</code></strong> is responsible for selecting the cells and applying the styles if the cells match the rule&rsquo;s logic.</li>
2404 </ul>
2405 <p>Using a <code>Rule</code> object, you can create numerous conditional formatting scenarios.</p>
2406 <p>However, for simplicity sake, the <code>openpyxl</code> package offers 3 built-in formats that make it easier to create a few common conditional formatting patterns. These built-ins are:</p>
2407 <ul>
2408 <li><code>ColorScale</code></li>
2409 <li><code>IconSet</code></li>
2410 <li><code>DataBar</code></li>
2411 </ul>
2412 <p>The <strong>ColorScale</strong> gives you the ability to create color gradients:</p>
2413 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">openpyxl.formatting.rule</span> <span class="k">import</span> <span class="n">ColorScaleRule</span>
2414 <span class="gp">&gt;&gt;&gt; </span><span class="n">color_scale_rule</span> <span class="o">=</span> <span class="n">ColorScaleRule</span><span class="p">(</span><span class="n">start_type</span><span class="o">=</span><span class="s2">&quot;min&quot;</span><span class="p">,</span>
2415 <span class="gp">... </span>                                  <span class="n">start_color</span><span class="o">=</span><span class="n">colors</span><span class="o">.</span><span class="n">RED</span><span class="p">,</span>
2416 <span class="gp">... </span>                                  <span class="n">end_type</span><span class="o">=</span><span class="s2">&quot;max&quot;</span><span class="p">,</span>
2417 <span class="gp">... </span>                                  <span class="n">end_color</span><span class="o">=</span><span class="n">colors</span><span class="o">.</span><span class="n">GREEN</span><span class="p">)</span>
2418 
2419 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Again, let&#39;s add this gradient to the star ratings, column &quot;H&quot;</span>
2420 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">conditional_formatting</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s2">&quot;H2:H100&quot;</span><span class="p">,</span> <span class="n">color_scale_rule</span><span class="p">)</span>
2421 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;sample_conditional_formatting_color_scale.xlsx&quot;</span><span class="p">)</span>
2422 </pre></div>
2423 
2424 <p>Now you should see a color gradient on column <code>H</code>, from red to green, according to the star rating:</p>
2425 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_19.00.57.26756963c1e9.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_19.00.57.26756963c1e9.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_19.00.57.26756963c1e9.png&amp;w=540&amp;sig=782964d150a8adc1de811fab78b0accde6357f85 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_19.00.57.26756963c1e9.png&amp;w=1080&amp;sig=4c7c87c194ec0a8eb3a9e73fdabf3ead991e96ea 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_19.00.57.26756963c1e9.png 2160w" sizes="75vw" alt="Example Spreadsheet With Color Scale Conditional Formatting"/></a></p>
2426 <p>You can also add a third color and make two gradients instead:</p>
2427 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">openpyxl.formatting.rule</span> <span class="k">import</span> <span class="n">ColorScaleRule</span>
2428 <span class="gp">&gt;&gt;&gt; </span><span class="n">color_scale_rule</span> <span class="o">=</span> <span class="n">ColorScaleRule</span><span class="p">(</span><span class="n">start_type</span><span class="o">=</span><span class="s2">&quot;num&quot;</span><span class="p">,</span>
2429 <span class="gp">... </span>                                  <span class="n">start_value</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
2430 <span class="gp">... </span>                                  <span class="n">start_color</span><span class="o">=</span><span class="n">colors</span><span class="o">.</span><span class="n">RED</span><span class="p">,</span>
2431 <span class="gp">... </span>                                  <span class="n">mid_type</span><span class="o">=</span><span class="s2">&quot;num&quot;</span><span class="p">,</span>
2432 <span class="gp">... </span>                                  <span class="n">mid_value</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span>
2433 <span class="gp">... </span>                                  <span class="n">mid_color</span><span class="o">=</span><span class="n">colors</span><span class="o">.</span><span class="n">YELLOW</span><span class="p">,</span>
2434 <span class="gp">... </span>                                  <span class="n">end_type</span><span class="o">=</span><span class="s2">&quot;num&quot;</span><span class="p">,</span>
2435 <span class="gp">... </span>                                  <span class="n">end_value</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span>
2436 <span class="gp">... </span>                                  <span class="n">end_color</span><span class="o">=</span><span class="n">colors</span><span class="o">.</span><span class="n">GREEN</span><span class="p">)</span>
2437 
2438 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Again, let&#39;s add this gradient to the star ratings, column &quot;H&quot;</span>
2439 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">conditional_formatting</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s2">&quot;H2:H100&quot;</span><span class="p">,</span> <span class="n">color_scale_rule</span><span class="p">)</span>
2440 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;sample_conditional_formatting_color_scale_3.xlsx&quot;</span><span class="p">)</span>
2441 </pre></div>
2442 
2443 <p>This time, you&rsquo;ll notice that star ratings between 1 and 3 have a gradient from red to yellow, and star ratings between 3 and 5 have a gradient from yellow to green:</p>
2444 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_19.03.30.0de9a2ff9866.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_19.03.30.0de9a2ff9866.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_19.03.30.0de9a2ff9866.png&amp;w=540&amp;sig=17daddc356c8ff5c78497b200fe57ab69f80617d 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_19.03.30.0de9a2ff9866.png&amp;w=1080&amp;sig=6dcb18f4aa3f8ae0a395ca1e3581f8d4c59805ad 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_19.03.30.0de9a2ff9866.png 2160w" sizes="75vw" alt="Example Spreadsheet With 2 Color Scales Conditional Formatting"/></a></p>
2445 <p>The <strong>IconSet</strong> allows you to add an icon to the cell according to its value:</p>
2446 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">openpyxl.formatting.rule</span> <span class="k">import</span> <span class="n">IconSetRule</span>
2447 
2448 <span class="gp">&gt;&gt;&gt; </span><span class="n">icon_set_rule</span> <span class="o">=</span> <span class="n">IconSetRule</span><span class="p">(</span><span class="s2">&quot;5Arrows&quot;</span><span class="p">,</span> <span class="s2">&quot;num&quot;</span><span class="p">,</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">])</span>
2449 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">conditional_formatting</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s2">&quot;H2:H100&quot;</span><span class="p">,</span> <span class="n">icon_set_rule</span><span class="p">)</span>
2450 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">&quot;sample_conditional_formatting_icon_set.xlsx&quot;</span><span class="p">)</span>
2451 </pre></div>
2452 
2453 <p>You&rsquo;ll see a colored arrow next to the star rating. This arrow is red and points down when the value of the cell is 1 and, as the rating gets better, the arrow starts pointing up and becomes green:</p>
2454 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_19.07.29.23e75ff46771.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_19.07.29.23e75ff46771.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_19.07.29.23e75ff46771.png&amp;w=540&amp;sig=388fc68ff53fa2e2d3d5678acfd41f10fa8eccde 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_19.07.29.23e75ff46771.png&amp;w=1080&amp;sig=e33f483f9758189782bb619a6714c948130375aa 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_19.07.29.23e75ff46771.png 2160w" sizes="75vw" alt="Example Spreadsheet With Icon Set Conditional Formatting"/></a></p>
2455 <p>The <code>openpyxl</code> package has a <a href="https://openpyxl.readthedocs.io/en/stable/formatting.html#iconset">full list</a> of other icons you can use, besides the arrow.</p>
2456 <p>Finally, the <strong>DataBar</strong> allows you to create progress bars:</p>
2457 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">openpyxl.formatting.rule</span> <span class="k">import</span> <span class="n">DataBarRule</span>
2458 
2459 <span class="gp">&gt;&gt;&gt; </span><span class="n">data_bar_rule</span> <span class="o">=</span> <span class="n">DataBarRule</span><span class="p">(</span><span class="n">start_type</span><span class="o">=</span><span class="s2">&quot;num&quot;</span><span class="p">,</span>
2460 <span class="gp">... </span>                            <span class="n">start_value</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
2461 <span class="gp">... </span>                            <span class="n">end_type</span><span class="o">=</span><span class="s2">&quot;num&quot;</span><span class="p">,</span>
2462 <span class="gp">... </span>                            <span class="n">end_value</span><span class="o">=</span><span class="s2">&quot;5&quot;</span><span class="p">,</span>
2463 <span class="gp">... </span>                            <span class="n">color</span><span class="o">=</span><span class="n">colors</span><span class="o">.</span><span class="n">GREEN</span><span class="p">)</span>
2464 <span class="gp">&gt;&gt;&gt; </span><span class="n">sheet</span><span class="o">.</span><span class="n">conditional_formatting</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s2">&quot;H2:H100&quot;</span><span class="p">,</span> <span class="n">data_bar_rule</span><span class="p">)</span>
2465 <span class="gp">&gt;&gt;&gt; </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">&quot;sample_conditional_formatting_data_bar.xlsx&quot;</span><span class="p">)</span>
2466 </pre></div>
2467 
2468 <p>You&rsquo;ll now see a green progress bar that gets fuller the closer the star rating is to the number 5:</p>
2469 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_19.09.10.ebbe032c088d.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_19.09.10.ebbe032c088d.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_19.09.10.ebbe032c088d.png&amp;w=540&amp;sig=a7b8e3515fde3ff6b662ff1780cbc290da9ce2ad 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_19.09.10.ebbe032c088d.png&amp;w=1080&amp;sig=1e3d2befc147a43c5b98061cf5888abf22ca4c4a 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_19.09.10.ebbe032c088d.png 2160w" sizes="75vw" alt="Example Spreadsheet With Data Bar Conditional Formatting"/></a></p>
2470 <p>As you can see, there are a lot of cool things you can do with conditional formatting.</p>
2471 <p>Here, you saw only a few examples of what you can achieve with it, but check the <code>openpyxl</code> <a href="https://openpyxl.readthedocs.io/en/stable/formatting.html">documentation</a> to see a bunch of other options.</p>
2472 <h3 id="adding-images">Adding Images</h3>
2473 <p>Even though images are not something that you&rsquo;ll often see in a spreadsheet, it&rsquo;s quite cool to be able to add them. Maybe you can use it for branding purposes or to make spreadsheets more personal.</p>
2474 <p>To be able to load images to a spreadsheet using <code>openpyxl</code>, you&rsquo;ll have to install <code>Pillow</code>:</p>
2475 <div class="highlight sh"><pre><span></span><span class="gp">$</span> pip install Pillow
2476 </pre></div>
2477 
2478 <p>Apart from that, you&rsquo;ll also need an image. For this example, you can grab the <em>Real Python</em> logo below and convert it from <code>.webp</code> to <code>.png</code> using an online converter such as <a href="https://cloudconvert.com/webp-to-png">cloudconvert.com</a>, save the final file as <code>logo.png</code>, and copy it to the root folder where you&rsquo;re running your examples:</p>
2479 <p><a href="https://files.realpython.com/media/real-python-logo-round.4d95338e8944.png" target="_blank"><img class="img-fluid mx-auto d-block w-25" src="https://files.realpython.com/media/real-python-logo-round.4d95338e8944.png" width="1500" height="1500" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/real-python-logo-round.4d95338e8944.png&amp;w=375&amp;sig=e431a39c9d7f2d5963a81687571a41288c359142 375w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/real-python-logo-round.4d95338e8944.png&amp;w=750&amp;sig=a098752adfc378feee6bc69748af593ed078b8c0 750w, https://files.realpython.com/media/real-python-logo-round.4d95338e8944.png 1500w" sizes="75vw" alt="Real Python Logo"/></a></p>
2480 <p>Afterward, this is the code you need to import that image into the <code>hello_word.xlsx</code> spreadsheet:</p>
2481 <div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">load_workbook</span>
2482 <span class="kn">from</span> <span class="nn">openpyxl.drawing.image</span> <span class="k">import</span> <span class="n">Image</span>
2483 
2484 <span class="c1"># Let&#39;s use the hello_world spreadsheet since it has less data</span>
2485 <span class="n">workbook</span> <span class="o">=</span> <span class="n">load_workbook</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;hello_world.xlsx&quot;</span><span class="p">)</span>
2486 <span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
2487 
2488 <span class="n">logo</span> <span class="o">=</span> <span class="n">Image</span><span class="p">(</span><span class="s2">&quot;logo.png&quot;</span><span class="p">)</span>
2489 
2490 <span class="c1"># A bit of resizing to not fill the whole spreadsheet with the logo</span>
2491 <span class="n">logo</span><span class="o">.</span><span class="n">height</span> <span class="o">=</span> <span class="mi">150</span>
2492 <span class="n">logo</span><span class="o">.</span><span class="n">width</span> <span class="o">=</span> <span class="mi">150</span>
2493 
2494 <span class="n">sheet</span><span class="o">.</span><span class="n">add_image</span><span class="p">(</span><span class="n">logo</span><span class="p">,</span> <span class="s2">&quot;A3&quot;</span><span class="p">)</span>
2495 <span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;hello_world_logo.xlsx&quot;</span><span class="p">)</span>
2496 </pre></div>
2497 
2498 <p>You have an image on your spreadsheet! Here it is:</p>
2499 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_20.05.30.2a69f2a77f68.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_20.05.30.2a69f2a77f68.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_20.05.30.2a69f2a77f68.png&amp;w=540&amp;sig=574c2c425c011fa21e790f7cd4a41547f2449b01 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_20.05.30.2a69f2a77f68.png&amp;w=1080&amp;sig=e6b9c78c7daa929ae86f887d560c3f50f0851d5d 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_20.05.30.2a69f2a77f68.png 2160w" sizes="75vw" alt="Example Spreadsheet With Image"/></a></p>
2500 <p>The image&rsquo;s left top corner is on the cell you chose, in this case, <code>A3</code>.</p>
2501 <h3 id="adding-pretty-charts">Adding Pretty Charts</h3>
2502 <p>Another powerful thing you can do with spreadsheets is create an incredible variety of charts. </p>
2503 <p>Charts are a great way to visualize and understand loads of data quickly. There are a lot of different chart types: bar chart, pie chart, line chart, and so on. <code>openpyxl</code> has support for a lot of them.</p>
2504 <p>Here, you&rsquo;ll see only a couple of examples of charts because the theory behind it is the same for every single chart type:</p>
2505 <div class="alert alert-primary" role="alert">
2506 <p><strong>Note:</strong> A few of the chart types that <code>openpyxl</code> currently doesn&rsquo;t have support for are Funnel, Gantt, Pareto, Treemap, Waterfall, Map, and Sunburst.</p>
2507 </div>
2508 <p>For any chart you want to build, you&rsquo;ll need to define the chart type: <code>BarChart</code>, <code>LineChart</code>, and so forth, plus the data to be used for the chart, which is called <code>Reference</code>.</p>
2509 <p>Before you can build your chart, you need to define what data you want to see represented in it. Sometimes, you can use the dataset as is, but other times you need to massage the data a bit to get additional information.</p>
2510 <p>Let&rsquo;s start by building a new workbook with some sample data:</p>
2511 <div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">Workbook</span>
2512 <span class="lineno"> 2 </span><span class="kn">from</span> <span class="nn">openpyxl.chart</span> <span class="k">import</span> <span class="n">BarChart</span><span class="p">,</span> <span class="n">Reference</span>
2513 <span class="lineno"> 3 </span>
2514 <span class="lineno"> 4 </span><span class="n">workbook</span> <span class="o">=</span> <span class="n">Workbook</span><span class="p">()</span>
2515 <span class="lineno"> 5 </span><span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
2516 <span class="lineno"> 6 </span>
2517 <span class="lineno"> 7 </span><span class="c1"># Let&#39;s create some sample sales data</span>
2518 <span class="lineno"> 8 </span><span class="n">rows</span> <span class="o">=</span> <span class="p">[</span>
2519 <span class="lineno"> 9 </span>    <span class="p">[</span><span class="s2">&quot;Product&quot;</span><span class="p">,</span> <span class="s2">&quot;Online&quot;</span><span class="p">,</span> <span class="s2">&quot;Store&quot;</span><span class="p">],</span>
2520 <span class="lineno">10 </span>    <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">30</span><span class="p">,</span> <span class="mi">45</span><span class="p">],</span>
2521 <span class="lineno">11 </span>    <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">40</span><span class="p">,</span> <span class="mi">30</span><span class="p">],</span>
2522 <span class="lineno">12 </span>    <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">40</span><span class="p">,</span> <span class="mi">25</span><span class="p">],</span>
2523 <span class="lineno">13 </span>    <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">50</span><span class="p">,</span> <span class="mi">30</span><span class="p">],</span>
2524 <span class="lineno">14 </span>    <span class="p">[</span><span class="mi">5</span><span class="p">,</span> <span class="mi">30</span><span class="p">,</span> <span class="mi">25</span><span class="p">],</span>
2525 <span class="lineno">15 </span>    <span class="p">[</span><span class="mi">6</span><span class="p">,</span> <span class="mi">25</span><span class="p">,</span> <span class="mi">35</span><span class="p">],</span>
2526 <span class="lineno">16 </span>    <span class="p">[</span><span class="mi">7</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">40</span><span class="p">],</span>
2527 <span class="lineno">17 </span><span class="p">]</span>
2528 <span class="lineno">18 </span>
2529 <span class="lineno">19 </span><span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">rows</span><span class="p">:</span>
2530 <span class="lineno">20 </span>    <span class="n">sheet</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
2531 </pre></div>
2532 
2533 <p>Now you&rsquo;re going to start by creating a <strong>bar chart</strong> that displays the total number of sales per product:</p>
2534 <div class="highlight python"><pre><span></span><span class="lineno">22 </span><span class="n">chart</span> <span class="o">=</span> <span class="n">BarChart</span><span class="p">()</span>
2535 <span class="lineno">23 </span><span class="n">data</span> <span class="o">=</span> <span class="n">Reference</span><span class="p">(</span><span class="n">worksheet</span><span class="o">=</span><span class="n">sheet</span><span class="p">,</span>
2536 <span class="lineno">24 </span>                 <span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
2537 <span class="lineno">25 </span>                 <span class="n">max_row</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span>
2538 <span class="lineno">26 </span>                 <span class="n">min_col</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
2539 <span class="lineno">27 </span>                 <span class="n">max_col</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
2540 <span class="lineno">28 </span>
2541 <span class="lineno">29 </span><span class="n">chart</span><span class="o">.</span><span class="n">add_data</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">titles_from_data</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
2542 <span class="lineno">30 </span><span class="n">sheet</span><span class="o">.</span><span class="n">add_chart</span><span class="p">(</span><span class="n">chart</span><span class="p">,</span> <span class="s2">&quot;E2&quot;</span><span class="p">)</span>
2543 <span class="lineno">31 </span>
2544 <span class="lineno">32 </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">&quot;chart.xlsx&quot;</span><span class="p">)</span>
2545 </pre></div>
2546 
2547 <p>There you have it. Below, you can see a very straightforward bar chart  showing the difference between <strong>online</strong> product sales online and <strong>in-store</strong> product sales:</p>
2548 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_20.59.43.7eac35127b97.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_20.59.43.7eac35127b97.png" width="2160" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_20.59.43.7eac35127b97.png&amp;w=540&amp;sig=bcdfa015a56b903169702ddbeb1ec06c8d67bc87 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_20.59.43.7eac35127b97.png&amp;w=1080&amp;sig=904be9b2a172b3ae7436311c9d05f6a9ad8ae451 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_20.59.43.7eac35127b97.png 2160w" sizes="75vw" alt="Example Spreadsheet With Bar Chart"/></a></p>
2549 <p>Like with images, the top left corner of the chart is on the cell you added the chart to. In your case, it was on cell <code>E2</code>.</p>
2550 <div class="alert alert-primary" role="alert">
2551 <p><strong>Note:</strong> Depending on whether you&rsquo;re using Microsoft Excel or an open-source alternative (LibreOffice or OpenOffice), the chart might look slightly different.</p>
2552 </div>
2553 <p>Try creating a <strong>line chart</strong> instead, changing the data a bit:</p>
2554 <div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="kn">import</span> <span class="nn">random</span>
2555 <span class="lineno"> 2 </span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">Workbook</span>
2556 <span class="lineno"> 3 </span><span class="kn">from</span> <span class="nn">openpyxl.chart</span> <span class="k">import</span> <span class="n">LineChart</span><span class="p">,</span> <span class="n">Reference</span>
2557 <span class="lineno"> 4 </span>
2558 <span class="lineno"> 5 </span><span class="n">workbook</span> <span class="o">=</span> <span class="n">Workbook</span><span class="p">()</span>
2559 <span class="lineno"> 6 </span><span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
2560 <span class="lineno"> 7 </span>
2561 <span class="lineno"> 8 </span><span class="c1"># Let&#39;s create some sample sales data</span>
2562 <span class="lineno"> 9 </span><span class="n">rows</span> <span class="o">=</span> <span class="p">[</span>
2563 <span class="lineno">10 </span>    <span class="p">[</span><span class="s2">&quot;&quot;</span><span class="p">,</span> <span class="s2">&quot;January&quot;</span><span class="p">,</span> <span class="s2">&quot;February&quot;</span><span class="p">,</span> <span class="s2">&quot;March&quot;</span><span class="p">,</span> <span class="s2">&quot;April&quot;</span><span class="p">,</span>
2564 <span class="lineno">11 </span>    <span class="s2">&quot;May&quot;</span><span class="p">,</span> <span class="s2">&quot;June&quot;</span><span class="p">,</span> <span class="s2">&quot;July&quot;</span><span class="p">,</span> <span class="s2">&quot;August&quot;</span><span class="p">,</span> <span class="s2">&quot;September&quot;</span><span class="p">,</span>
2565 <span class="lineno">12 </span>     <span class="s2">&quot;October&quot;</span><span class="p">,</span> <span class="s2">&quot;November&quot;</span><span class="p">,</span> <span class="s2">&quot;December&quot;</span><span class="p">],</span>
2566 <span class="lineno">13 </span>    <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="p">],</span>
2567 <span class="lineno">14 </span>    <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="p">],</span>
2568 <span class="lineno">15 </span>    <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="p">],</span>
2569 <span class="lineno">16 </span><span class="p">]</span>
2570 <span class="lineno">17 </span>
2571 <span class="lineno">18 </span><span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">rows</span><span class="p">:</span>
2572 <span class="lineno">19 </span>    <span class="n">sheet</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
2573 <span class="lineno">20 </span>
2574 <span class="lineno">21 </span><span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
2575 <span class="lineno">22 </span>                           <span class="n">max_row</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
2576 <span class="lineno">23 </span>                           <span class="n">min_col</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
2577 <span class="lineno">24 </span>                           <span class="n">max_col</span><span class="o">=</span><span class="mi">13</span><span class="p">):</span>
2578 <span class="lineno">25 </span>    <span class="k">for</span> <span class="n">cell</span> <span class="ow">in</span> <span class="n">row</span><span class="p">:</span>
2579 <span class="lineno">26 </span>        <span class="n">cell</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
2580 </pre></div>
2581 
2582 <p>With the above code, you&rsquo;ll be able to generate some random data regarding the sales of 3 different products across a whole year.</p>
2583 <p>Once that&rsquo;s done, you can very easily create a line chart with the following code:</p>
2584 <div class="highlight python"><pre><span></span><span class="lineno">28 </span><span class="n">chart</span> <span class="o">=</span> <span class="n">LineChart</span><span class="p">()</span>
2585 <span class="lineno">29 </span><span class="n">data</span> <span class="o">=</span> <span class="n">Reference</span><span class="p">(</span><span class="n">worksheet</span><span class="o">=</span><span class="n">sheet</span><span class="p">,</span>
2586 <span class="lineno">30 </span>                 <span class="n">min_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
2587 <span class="lineno">31 </span>                 <span class="n">max_row</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
2588 <span class="lineno">32 </span>                 <span class="n">min_col</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
2589 <span class="lineno">33 </span>                 <span class="n">max_col</span><span class="o">=</span><span class="mi">13</span><span class="p">)</span>
2590 <span class="lineno">34 </span>
2591 <span class="lineno">35 </span><span class="n">chart</span><span class="o">.</span><span class="n">add_data</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">from_rows</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">titles_from_data</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
2592 <span class="lineno">36 </span><span class="n">sheet</span><span class="o">.</span><span class="n">add_chart</span><span class="p">(</span><span class="n">chart</span><span class="p">,</span> <span class="s2">&quot;C6&quot;</span><span class="p">)</span>
2593 <span class="lineno">37 </span>
2594 <span class="lineno">38 </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">&quot;line_chart.xlsx&quot;</span><span class="p">)</span>
2595 </pre></div>
2596 
2597 <p>Here&rsquo;s the outcome of the above piece of code:</p>
2598 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_21.06.42.e4e52ab1b433.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_21.06.42.e4e52ab1b433.png" width="2160" height="1414" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.06.42.e4e52ab1b433.png&amp;w=540&amp;sig=362319a9716ded57c9567de98c52a4dc805b5346 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.06.42.e4e52ab1b433.png&amp;w=1080&amp;sig=cbd7b37aa77318e4ed8d2401281817fb53a7144b 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_21.06.42.e4e52ab1b433.png 2160w" sizes="75vw" alt="Example Spreadsheet With Line Chart"/></a></p>
2599 <p>One thing to keep in mind here is the fact that you&rsquo;re using <code>from_rows=True</code> when adding the data. This argument makes the chart plot row by row instead of column by column.</p>
2600 <p>In your sample data, you see that each product has a row with 12 values (1 column per month). That&rsquo;s why you use <code>from_rows</code>. If you don&rsquo;t pass that argument, by default, the chart tries to plot by column, and you&rsquo;ll get a month-by-month comparison of sales.</p>
2601 <p>Another difference that has to do with the above argument change is the fact that our <code>Reference</code> now starts from the first column, <code>min_col=1</code>, instead of the second one. This change is needed because the chart now expects the first column to have the titles.</p>
2602 <p>There are a couple of other things you can also change regarding the style of the chart. For example, you can add specific categories to the chart:</p>
2603 <div class="highlight python"><pre><span></span><span class="n">cats</span> <span class="o">=</span> <span class="n">Reference</span><span class="p">(</span><span class="n">worksheet</span><span class="o">=</span><span class="n">sheet</span><span class="p">,</span>
2604                  <span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
2605                  <span class="n">max_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
2606                  <span class="n">min_col</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
2607                  <span class="n">max_col</span><span class="o">=</span><span class="mi">13</span><span class="p">)</span>
2608 <span class="n">chart</span><span class="o">.</span><span class="n">set_categories</span><span class="p">(</span><span class="n">cats</span><span class="p">)</span>
2609 </pre></div>
2610 
2611 <p>Add this piece of code before saving the workbook, and you should see the month names appearing instead of numbers:</p>
2612 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_21.08.05.8867e2cced85.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_21.08.05.8867e2cced85.png" width="2160" height="1414" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.08.05.8867e2cced85.png&amp;w=540&amp;sig=6e48719b75e585dcb58fec3768def0630bb367ff 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.08.05.8867e2cced85.png&amp;w=1080&amp;sig=9fed1350289fe067d8f50c07516bfcc460f4a720 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_21.08.05.8867e2cced85.png 2160w" sizes="75vw" alt="Example Spreadsheet With Line Chart and Categories"/></a></p>
2613 <p>Code-wise, this is a minimal change. But in terms of the readability of the spreadsheet, this makes it much easier for someone to open the spreadsheet and understand the chart straight away.</p>
2614 <p>Another thing you can do to improve the chart readability is to add an axis. You can do it using the attributes <code>x_axis</code> and <code>y_axis</code>:</p>
2615 <div class="highlight python"><pre><span></span><span class="n">chart</span><span class="o">.</span><span class="n">x_axis</span><span class="o">.</span><span class="n">title</span> <span class="o">=</span> <span class="s2">&quot;Months&quot;</span>
2616 <span class="n">chart</span><span class="o">.</span><span class="n">y_axis</span><span class="o">.</span><span class="n">title</span> <span class="o">=</span> <span class="s2">&quot;Sales (per unit)&quot;</span>
2617 </pre></div>
2618 
2619 <p>This will generate a spreadsheet like the below one:</p>
2620 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_21.09.46.ce55f629b073.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_21.09.46.ce55f629b073.png" width="2160" height="1414" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.09.46.ce55f629b073.png&amp;w=540&amp;sig=ac73f73702b55a957c77d6da224cde46c2c9a802 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.09.46.ce55f629b073.png&amp;w=1080&amp;sig=408f6929a900e4a5ccb5cb1cca8647cf64d0d069 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_21.09.46.ce55f629b073.png 2160w" sizes="75vw" alt="Example Spreadsheet With Line Chart, Categories and Axis Titles"/></a></p>
2621 <p>As you can see, small changes like the above make reading your chart a much easier and quicker task.</p>
2622 <p>There is also a way to style your chart by using Excel&rsquo;s default <code>ChartStyle</code> property. In this case, you have to choose a number between 1 and 48. Depending on your choice, the colors of your chart change as well:</p>
2623 <div class="highlight python"><pre><span></span><span class="c1"># You can play with this by choosing any number between 1 and 48</span>
2624 <span class="n">chart</span><span class="o">.</span><span class="n">style</span> <span class="o">=</span> <span class="mi">24</span>
2625 </pre></div>
2626 
2627 <p>With the style selected above, all lines have some shade of orange:</p>
2628 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_21.16.31.7df18bbe94cb.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_21.16.31.7df18bbe94cb.png" width="2160" height="1414" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.16.31.7df18bbe94cb.png&amp;w=540&amp;sig=4b0439c6c48d0b3f96f411e6198f6fd49d5c7026 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.16.31.7df18bbe94cb.png&amp;w=1080&amp;sig=4d0aad0a27321bb321dc9c88158f357155968121 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_21.16.31.7df18bbe94cb.png 2160w" sizes="75vw" alt="Example Spreadsheet With Line Chart, Categories, Axis Titles and Style"/></a></p>
2629 <p>There is no clear documentation on what each style number looks like, but <a href="https://1drv.ms/x/s!Asf0Y5Y4GI3Mg6kZNRd1IA09NLWv9A">this spreadsheet</a> has a few examples of the styles available.</p>
2630 <div class="card mb-3" id="collapse_card0fb191">
2631 <div class="card-header border-0"><p class="m-0"><button class="btn" data-toggle="collapse" data-target="#collapse0fb191" aria-expanded="false" aria-controls="collapse0fb191">Complete Code Example</button> <button class="btn btn-link float-right" data-toggle="collapse" data-target="#collapse0fb191" aria-expanded="false" aria-controls="collapse0fb191">Show/Hide</button></p></div>
2632 <div id="collapse0fb191" class="collapse" data-parent="#collapse_card0fb191"><div class="card-body" markdown="1">
2633 
2634 <p>Here&rsquo;s the full code used to generate the line chart with categories, axis titles, and style:</p>
2635 <div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">random</span>
2636 <span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">Workbook</span>
2637 <span class="kn">from</span> <span class="nn">openpyxl.chart</span> <span class="k">import</span> <span class="n">LineChart</span><span class="p">,</span> <span class="n">Reference</span>
2638 
2639 <span class="n">workbook</span> <span class="o">=</span> <span class="n">Workbook</span><span class="p">()</span>
2640 <span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
2641 
2642 <span class="c1"># Let&#39;s create some sample sales data</span>
2643 <span class="n">rows</span> <span class="o">=</span> <span class="p">[</span>
2644     <span class="p">[</span><span class="s2">&quot;&quot;</span><span class="p">,</span> <span class="s2">&quot;January&quot;</span><span class="p">,</span> <span class="s2">&quot;February&quot;</span><span class="p">,</span> <span class="s2">&quot;March&quot;</span><span class="p">,</span> <span class="s2">&quot;April&quot;</span><span class="p">,</span>
2645     <span class="s2">&quot;May&quot;</span><span class="p">,</span> <span class="s2">&quot;June&quot;</span><span class="p">,</span> <span class="s2">&quot;July&quot;</span><span class="p">,</span> <span class="s2">&quot;August&quot;</span><span class="p">,</span> <span class="s2">&quot;September&quot;</span><span class="p">,</span>
2646      <span class="s2">&quot;October&quot;</span><span class="p">,</span> <span class="s2">&quot;November&quot;</span><span class="p">,</span> <span class="s2">&quot;December&quot;</span><span class="p">],</span>
2647     <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="p">],</span>
2648     <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="p">],</span>
2649     <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="p">],</span>
2650 <span class="p">]</span>
2651 
2652 <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">rows</span><span class="p">:</span>
2653     <span class="n">sheet</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
2654 
2655 <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">sheet</span><span class="o">.</span><span class="n">iter_rows</span><span class="p">(</span><span class="n">min_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
2656                            <span class="n">max_row</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
2657                            <span class="n">min_col</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
2658                            <span class="n">max_col</span><span class="o">=</span><span class="mi">13</span><span class="p">):</span>
2659     <span class="k">for</span> <span class="n">cell</span> <span class="ow">in</span> <span class="n">row</span><span class="p">:</span>
2660         <span class="n">cell</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
2661 
2662 <span class="c1"># Create a LineChart and add the main data</span>
2663 <span class="n">chart</span> <span class="o">=</span> <span class="n">LineChart</span><span class="p">()</span>
2664 <span class="n">data</span> <span class="o">=</span> <span class="n">Reference</span><span class="p">(</span><span class="n">worksheet</span><span class="o">=</span><span class="n">sheet</span><span class="p">,</span>
2665                            <span class="n">min_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
2666                            <span class="n">max_row</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
2667                            <span class="n">min_col</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
2668                            <span class="n">max_col</span><span class="o">=</span><span class="mi">13</span><span class="p">)</span>
2669 <span class="n">chart</span><span class="o">.</span><span class="n">add_data</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">titles_from_data</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">from_rows</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
2670 
2671 <span class="c1"># Add categories to the chart</span>
2672 <span class="n">cats</span> <span class="o">=</span> <span class="n">Reference</span><span class="p">(</span><span class="n">worksheet</span><span class="o">=</span><span class="n">sheet</span><span class="p">,</span>
2673                  <span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
2674                  <span class="n">max_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
2675                  <span class="n">min_col</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
2676                  <span class="n">max_col</span><span class="o">=</span><span class="mi">13</span><span class="p">)</span>
2677 <span class="n">chart</span><span class="o">.</span><span class="n">set_categories</span><span class="p">(</span><span class="n">cats</span><span class="p">)</span>
2678 
2679 <span class="c1"># Rename the X and Y Axis</span>
2680 <span class="n">chart</span><span class="o">.</span><span class="n">x_axis</span><span class="o">.</span><span class="n">title</span> <span class="o">=</span> <span class="s2">&quot;Months&quot;</span>
2681 <span class="n">chart</span><span class="o">.</span><span class="n">y_axis</span><span class="o">.</span><span class="n">title</span> <span class="o">=</span> <span class="s2">&quot;Sales (per unit)&quot;</span>
2682 
2683 <span class="c1"># Apply a specific Style</span>
2684 <span class="n">chart</span><span class="o">.</span><span class="n">style</span> <span class="o">=</span> <span class="mi">24</span>
2685 
2686 <span class="c1"># Save!</span>
2687 <span class="n">sheet</span><span class="o">.</span><span class="n">add_chart</span><span class="p">(</span><span class="n">chart</span><span class="p">,</span> <span class="s2">&quot;C6&quot;</span><span class="p">)</span>
2688 <span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">&quot;line_chart.xlsx&quot;</span><span class="p">)</span>
2689 </pre></div>
2690 
2691 </div></div>
2692 
2693 </div>
2694 <p>There are a lot more chart types and customization you can apply, so be sure to check out the <a href="https://openpyxl.readthedocs.io/en/stable/charts/introduction.html">package documentation</a> on this if you need some specific formatting.</p>
2695 <h3 id="convert-python-classes-to-excel-spreadsheet">Convert Python Classes to Excel Spreadsheet</h3>
2696 <p>You already saw how to convert an Excel spreadsheet&rsquo;s data into Python classes, but now let&rsquo;s do the opposite.</p>
2697 <p>Let&rsquo;s imagine you have a database and are using some Object-Relational Mapping (ORM) to map DB objects into Python classes. Now, you want to export those same objects into a spreadsheet.</p>
2698 <p>Let&rsquo;s assume the following <a href="https://realpython.com/python-data-classes/">data classes</a> to represent the data coming from your database regarding product sales:</p>
2699 <div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">dataclasses</span> <span class="k">import</span> <span class="n">dataclass</span>
2700 <span class="kn">from</span> <span class="nn">typing</span> <span class="k">import</span> <span class="n">List</span>
2701 
2702 <span class="nd">@dataclass</span>
2703 <span class="k">class</span> <span class="nc">Sale</span><span class="p">:</span>
2704     <span class="nb">id</span><span class="p">:</span> <span class="nb">str</span>
2705     <span class="n">quantity</span><span class="p">:</span> <span class="nb">int</span>
2706 
2707 <span class="nd">@dataclass</span>
2708 <span class="k">class</span> <span class="nc">Product</span><span class="p">:</span>
2709     <span class="nb">id</span><span class="p">:</span> <span class="nb">str</span>
2710     <span class="n">name</span><span class="p">:</span> <span class="nb">str</span>
2711     <span class="n">sales</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="n">Sale</span><span class="p">]</span>
2712 </pre></div>
2713 
2714 <p>Now, let&rsquo;s generate some random data, assuming the above classes are stored in a <code>db_classes.py</code> file:</p>
2715 <div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="kn">import</span> <span class="nn">random</span>
2716 <span class="lineno"> 2 </span>
2717 <span class="lineno"> 3 </span><span class="c1"># Ignore these for now. You&#39;ll use them in a sec ;)</span>
2718 <span class="lineno"> 4 </span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">Workbook</span>
2719 <span class="lineno"> 5 </span><span class="kn">from</span> <span class="nn">openpyxl.chart</span> <span class="k">import</span> <span class="n">LineChart</span><span class="p">,</span> <span class="n">Reference</span>
2720 <span class="lineno"> 6 </span>
2721 <span class="lineno"> 7 </span><span class="kn">from</span> <span class="nn">db_classes</span> <span class="k">import</span> <span class="n">Product</span><span class="p">,</span> <span class="n">Sale</span>
2722 <span class="lineno"> 8 </span>
2723 <span class="lineno"> 9 </span><span class="n">products</span> <span class="o">=</span> <span class="p">[]</span>
2724 <span class="lineno">10 </span>
2725 <span class="lineno">11 </span><span class="c1"># Let&#39;s create 5 products</span>
2726 <span class="lineno">12 </span><span class="k">for</span> <span class="n">idx</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">6</span><span class="p">):</span>
2727 <span class="lineno">13 </span>    <span class="n">sales</span> <span class="o">=</span> <span class="p">[]</span>
2728 <span class="lineno">14 </span>
2729 <span class="lineno">15 </span>    <span class="c1"># Create 5 months of sales</span>
2730 <span class="lineno">16 </span>    <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
2731 <span class="lineno">17 </span>        <span class="n">sale</span> <span class="o">=</span> <span class="n">Sale</span><span class="p">(</span><span class="n">quantity</span><span class="o">=</span><span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">100</span><span class="p">))</span>
2732 <span class="lineno">18 </span>        <span class="n">sales</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">sale</span><span class="p">)</span>
2733 <span class="lineno">19 </span>
2734 <span class="lineno">20 </span>    <span class="n">product</span> <span class="o">=</span> <span class="n">Product</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="nb">str</span><span class="p">(</span><span class="n">idx</span><span class="p">),</span>
2735 <span class="lineno">21 </span>                      <span class="n">name</span><span class="o">=</span><span class="s2">&quot;Product </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">idx</span><span class="p">,</span>
2736 <span class="lineno">22 </span>                      <span class="n">sales</span><span class="o">=</span><span class="n">sales</span><span class="p">)</span>
2737 <span class="lineno">23 </span>    <span class="n">products</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">product</span><span class="p">)</span>
2738 </pre></div>
2739 
2740 <p>By running this piece of code, you should get 5 products with 5 months of sales with a random quantity of sales for each month.</p>
2741 <p>Now, to convert this into a spreadsheet, you need to iterate over the data and append it to the spreadsheet:</p>
2742 <div class="highlight python"><pre><span></span><span class="lineno">25 </span><span class="n">workbook</span> <span class="o">=</span> <span class="n">Workbook</span><span class="p">()</span>
2743 <span class="lineno">26 </span><span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
2744 <span class="lineno">27 </span>
2745 <span class="lineno">28 </span><span class="c1"># Append column names first</span>
2746 <span class="lineno">29 </span><span class="n">sheet</span><span class="o">.</span><span class="n">append</span><span class="p">([</span><span class="s2">&quot;Product ID&quot;</span><span class="p">,</span> <span class="s2">&quot;Product Name&quot;</span><span class="p">,</span> <span class="s2">&quot;Month 1&quot;</span><span class="p">,</span>
2747 <span class="lineno">30 </span>              <span class="s2">&quot;Month 2&quot;</span><span class="p">,</span> <span class="s2">&quot;Month 3&quot;</span><span class="p">,</span> <span class="s2">&quot;Month 4&quot;</span><span class="p">,</span> <span class="s2">&quot;Month 5&quot;</span><span class="p">])</span>
2748 <span class="lineno">31 </span>
2749 <span class="lineno">32 </span><span class="c1"># Append the data</span>
2750 <span class="lineno">33 </span><span class="k">for</span> <span class="n">product</span> <span class="ow">in</span> <span class="n">products</span><span class="p">:</span>
2751 <span class="lineno">34 </span>    <span class="n">data</span> <span class="o">=</span> <span class="p">[</span><span class="n">product</span><span class="o">.</span><span class="n">id</span><span class="p">,</span> <span class="n">product</span><span class="o">.</span><span class="n">name</span><span class="p">]</span>
2752 <span class="lineno">35 </span>    <span class="k">for</span> <span class="n">sale</span> <span class="ow">in</span> <span class="n">product</span><span class="o">.</span><span class="n">sales</span><span class="p">:</span>
2753 <span class="lineno">36 </span>        <span class="n">data</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">sale</span><span class="o">.</span><span class="n">quantity</span><span class="p">)</span>
2754 <span class="lineno">37 </span>    <span class="n">sheet</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
2755 </pre></div>
2756 
2757 <p>That&rsquo;s it. That should allow you to create a spreadsheet with some data coming from your database.</p>
2758 <p>However, why not use some of that cool knowledge you gained recently to add a chart as well to display that data more visually?</p>
2759 <p>All right, then you could probably do something like this:</p>
2760 <div class="highlight python"><pre><span></span><span class="lineno">38 </span><span class="n">chart</span> <span class="o">=</span> <span class="n">LineChart</span><span class="p">()</span>
2761 <span class="lineno">39 </span><span class="n">data</span> <span class="o">=</span> <span class="n">Reference</span><span class="p">(</span><span class="n">worksheet</span><span class="o">=</span><span class="n">sheet</span><span class="p">,</span>
2762 <span class="lineno">40 </span>                 <span class="n">min_row</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
2763 <span class="lineno">41 </span>                 <span class="n">max_row</span><span class="o">=</span><span class="mi">6</span><span class="p">,</span>
2764 <span class="lineno">42 </span>                 <span class="n">min_col</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
2765 <span class="lineno">43 </span>                 <span class="n">max_col</span><span class="o">=</span><span class="mi">7</span><span class="p">)</span>
2766 <span class="lineno">44 </span>
2767 <span class="lineno">45 </span><span class="n">chart</span><span class="o">.</span><span class="n">add_data</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">titles_from_data</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">from_rows</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
2768 <span class="lineno">46 </span><span class="n">sheet</span><span class="o">.</span><span class="n">add_chart</span><span class="p">(</span><span class="n">chart</span><span class="p">,</span> <span class="s2">&quot;B8&quot;</span><span class="p">)</span>
2769 <span class="lineno">47 </span>
2770 <span class="lineno">48 </span><span class="n">cats</span> <span class="o">=</span> <span class="n">Reference</span><span class="p">(</span><span class="n">worksheet</span><span class="o">=</span><span class="n">sheet</span><span class="p">,</span>
2771 <span class="lineno">49 </span>                 <span class="n">min_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
2772 <span class="lineno">50 </span>                 <span class="n">max_row</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
2773 <span class="lineno">51 </span>                 <span class="n">min_col</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span>
2774 <span class="lineno">52 </span>                 <span class="n">max_col</span><span class="o">=</span><span class="mi">7</span><span class="p">)</span>
2775 <span class="lineno">53 </span><span class="n">chart</span><span class="o">.</span><span class="n">set_categories</span><span class="p">(</span><span class="n">cats</span><span class="p">)</span>
2776 <span class="lineno">54 </span>
2777 <span class="lineno">55 </span><span class="n">chart</span><span class="o">.</span><span class="n">x_axis</span><span class="o">.</span><span class="n">title</span> <span class="o">=</span> <span class="s2">&quot;Months&quot;</span>
2778 <span class="lineno">56 </span><span class="n">chart</span><span class="o">.</span><span class="n">y_axis</span><span class="o">.</span><span class="n">title</span> <span class="o">=</span> <span class="s2">&quot;Sales (per unit)&quot;</span>
2779 <span class="lineno">57 </span>
2780 <span class="lineno">58 </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;oop_sample.xlsx&quot;</span><span class="p">)</span>
2781 </pre></div>
2782 
2783 <p>Now we&rsquo;re talking! Here&rsquo;s a spreadsheet generated from database objects and with a chart and everything:</p>
2784 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_21.26.23.1f355e76586d.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_21.26.23.1f355e76586d.png" width="2160" height="1414" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.26.23.1f355e76586d.png&amp;w=540&amp;sig=135f4ee5413467c91f65bbb6e914724cdf1fd413 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.26.23.1f355e76586d.png&amp;w=1080&amp;sig=5b7dd165f92c237ddd049350ca6fc6a165e10512 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_21.26.23.1f355e76586d.png 2160w" sizes="75vw" alt="Example Spreadsheet With Conversion from Python Data Classes"/></a></p>
2785 <p>That&rsquo;s a great way for you to wrap up your new knowledge of charts!</p>
2786 <h3 id="bonus-working-with-pandas">Bonus: Working With Pandas</h3>
2787 <p>Even though you can use <a href="https://realpython.com/working-with-large-excel-files-in-pandas/">Pandas to handle Excel files</a>, there are few things that you either can&rsquo;t accomplish with Pandas or that you&rsquo;d be better off just using <code>openpyxl</code> directly.</p>
2788 <p>For example, some of the advantages of using <code>openpyxl</code> are the ability to easily customize your spreadsheet with styles, conditional formatting, and such.</p>
2789 <p>But guess what, you don&rsquo;t have to worry about picking. In fact, <code>openpyxl</code> has support for both converting data from a Pandas DataFrame into a workbook or the opposite, converting an <code>openpyxl</code> workbook into a Pandas DataFrame.</p>
2790 <div class="alert alert-primary" role="alert">
2791 <p><strong>Note:</strong> If you&rsquo;re new to Pandas, check our <a href="https://realpython.com/courses/pandas-dataframes-101/">course on Pandas DataFrames</a> beforehand.</p>
2792 </div>
2793 <p>First things first, remember to install the <code>pandas</code> package:</p>
2794 <div class="highlight sh"><pre><span></span><span class="gp">$</span> pip install pandas
2795 </pre></div>
2796 
2797 <p>Then, let&rsquo;s create a sample DataFrame:</p>
2798 <div class="highlight python"><pre><span></span><span class="lineno"> 1 </span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
2799 <span class="lineno"> 2 </span>
2800 <span class="lineno"> 3 </span><span class="n">data</span> <span class="o">=</span> <span class="p">{</span>
2801 <span class="lineno"> 4 </span>    <span class="s2">&quot;Product Name&quot;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;Product 1&quot;</span><span class="p">,</span> <span class="s2">&quot;Product 2&quot;</span><span class="p">],</span>
2802 <span class="lineno"> 5 </span>    <span class="s2">&quot;Sales Month 1&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">],</span>
2803 <span class="lineno"> 6 </span>    <span class="s2">&quot;Sales Month 2&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">5</span><span class="p">,</span> <span class="mi">35</span><span class="p">],</span>
2804 <span class="lineno"> 7 </span><span class="p">}</span>
2805 <span class="lineno"> 8 </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
2806 </pre></div>
2807 
2808 <p>Now that you have some data, you can use <code>.dataframe_to_rows()</code> to convert it from a DataFrame into a worksheet:</p>
2809 <div class="highlight python"><pre><span></span><span class="lineno">10 </span><span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">Workbook</span>
2810 <span class="lineno">11 </span><span class="kn">from</span> <span class="nn">openpyxl.utils.dataframe</span> <span class="k">import</span> <span class="n">dataframe_to_rows</span>
2811 <span class="lineno">12 </span>
2812 <span class="lineno">13 </span><span class="n">workbook</span> <span class="o">=</span> <span class="n">Workbook</span><span class="p">()</span>
2813 <span class="lineno">14 </span><span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
2814 <span class="lineno">15 </span>
2815 <span class="lineno">16 </span><span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">dataframe_to_rows</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">index</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
2816 <span class="lineno">17 </span>    <span class="n">sheet</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
2817 <span class="lineno">18 </span>
2818 <span class="lineno">19 </span><span class="n">workbook</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">&quot;pandas.xlsx&quot;</span><span class="p">)</span>
2819 </pre></div>
2820 
2821 <p>You should see a spreadsheet that looks like this:</p>
2822 <p><a href="https://files.realpython.com/media/Screenshot_2019-06-24_21.42.15.0a4208db25f0.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Screenshot_2019-06-24_21.42.15.0a4208db25f0.png" width="2160" height="1414" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.42.15.0a4208db25f0.png&amp;w=540&amp;sig=636303dd8f99512651c5868f4ef572b2afa75d3c 540w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screenshot_2019-06-24_21.42.15.0a4208db25f0.png&amp;w=1080&amp;sig=c2ddf015c46566fc545ad03187cc5780a61938f9 1080w, https://files.realpython.com/media/Screenshot_2019-06-24_21.42.15.0a4208db25f0.png 2160w" sizes="75vw" alt="Example Spreadsheet With Data from Pandas Data Frame"/></a></p>
2823 <p>If you want to add the <a href="https://realpython.com/python-data-cleaning-numpy-pandas/#changing-the-index-of-a-dataframe">DataFrame&rsquo;s index</a>, you can change <code>index=True</code>, and it adds each row&rsquo;s index into your spreadsheet.</p>
2824 <p>On the other hand, if you want to convert a spreadsheet into a DataFrame, you can also do it in a very straightforward way like so:</p>
2825 <div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
2826 <span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">load_workbook</span>
2827 
2828 <span class="n">workbook</span> <span class="o">=</span> <span class="n">load_workbook</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;sample.xlsx&quot;</span><span class="p">)</span>
2829 <span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
2830 
2831 <span class="n">values</span> <span class="o">=</span> <span class="n">sheet</span><span class="o">.</span><span class="n">values</span>
2832 <span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">values</span><span class="p">)</span>
2833 </pre></div>
2834 
2835 <p>Alternatively, if you want to add the correct headers and use the review ID as the index, for example, then you can also do it like this instead:</p>
2836 <div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
2837 <span class="kn">from</span> <span class="nn">openpyxl</span> <span class="k">import</span> <span class="n">load_workbook</span>
2838 <span class="kn">from</span> <span class="nn">mapping</span> <span class="k">import</span> <span class="n">REVIEW_ID</span>
2839 
2840 <span class="n">workbook</span> <span class="o">=</span> <span class="n">load_workbook</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s2">&quot;sample.xlsx&quot;</span><span class="p">)</span>
2841 <span class="n">sheet</span> <span class="o">=</span> <span class="n">workbook</span><span class="o">.</span><span class="n">active</span>
2842 
2843 <span class="n">data</span> <span class="o">=</span> <span class="n">sheet</span><span class="o">.</span><span class="n">values</span>
2844 
2845 <span class="c1"># Set the first row as the columns for the DataFrame</span>
2846 <span class="n">cols</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
2847 <span class="n">data</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
2848 
2849 <span class="c1"># Set the field &quot;review_id&quot; as the indexes for each row</span>
2850 <span class="n">idx</span> <span class="o">=</span> <span class="p">[</span><span class="n">row</span><span class="p">[</span><span class="n">REVIEW_ID</span><span class="p">]</span> <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
2851 
2852 <span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">index</span><span class="o">=</span><span class="n">idx</span><span class="p">,</span> <span class="n">columns</span><span class="o">=</span><span class="n">cols</span><span class="p">)</span>
2853 </pre></div>
2854 
2855 <p>Using indexes and columns allows you to access data from your DataFrame easily:</p>
2856 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">df</span><span class="o">.</span><span class="n">columns</span>
2857 <span class="go">Index([&#39;marketplace&#39;, &#39;customer_id&#39;, &#39;review_id&#39;, &#39;product_id&#39;,</span>
2858 <span class="go">       &#39;product_parent&#39;, &#39;product_title&#39;, &#39;product_category&#39;, &#39;star_rating&#39;,</span>
2859 <span class="go">       &#39;helpful_votes&#39;, &#39;total_votes&#39;, &#39;vine&#39;, &#39;verified_purchase&#39;,</span>
2860 <span class="go">       &#39;review_headline&#39;, &#39;review_body&#39;, &#39;review_date&#39;],</span>
2861 <span class="go">      dtype=&#39;object&#39;)</span>
2862 
2863 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Get first 10 reviews&#39; star rating</span>
2864 <span class="gp">&gt;&gt;&gt; </span><span class="n">df</span><span class="p">[</span><span class="s2">&quot;star_rating&quot;</span><span class="p">][:</span><span class="mi">10</span><span class="p">]</span>
2865 <span class="go">R3O9SGZBVQBV76    5</span>
2866 <span class="go">RKH8BNC3L5DLF     5</span>
2867 <span class="go">R2HLE8WKZSU3NL    2</span>
2868 <span class="go">R31U3UH5AZ42LL    5</span>
2869 <span class="go">R2SV659OUJ945Y    4</span>
2870 <span class="go">RA51CP8TR5A2L     5</span>
2871 <span class="go">RB2Q7DLDN6TH6     5</span>
2872 <span class="go">R2RHFJV0UYBK3Y    1</span>
2873 <span class="go">R2Z6JOQ94LFHEP    5</span>
2874 <span class="go">RX27XIIWY5JPB     4</span>
2875 <span class="go">Name: star_rating, dtype: int64</span>
2876 
2877 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Grab review with id &quot;R2EQL1V1L6E0C9&quot;, using the index</span>
2878 <span class="gp">&gt;&gt;&gt; </span><span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="s2">&quot;R2EQL1V1L6E0C9&quot;</span><span class="p">]</span>
2879 <span class="go">marketplace               US</span>
2880 <span class="go">customer_id         15305006</span>
2881 <span class="go">review_id     R2EQL1V1L6E0C9</span>
2882 <span class="go">product_id        B004LURNO6</span>
2883 <span class="go">product_parent     892860326</span>
2884 <span class="go">review_headline   Five Stars</span>
2885 <span class="go">review_body          Love it</span>
2886 <span class="go">review_date       2015-08-31</span>
2887 <span class="go">Name: R2EQL1V1L6E0C9, dtype: object</span>
2888 </pre></div>
2889 
2890 <p>There you go, whether you want to use <code>openpyxl</code> to prettify your Pandas dataset or use Pandas to do some hardcore algebra, you now know how to switch between both packages.</p>
2891 <h2 id="conclusion">Conclusion</h2>
2892 <p><em>Phew</em>, after that long read, you now know how to work with spreadsheets in Python! You can rely on <code>openpyxl</code>, your trustworthy companion, to:</p>
2893 <ul>
2894 <li>Extract valuable information from spreadsheets in a Pythonic manner</li>
2895 <li>Create your own spreadsheets, no matter the complexity level</li>
2896 <li>Add cool features such as conditional formatting or charts to your spreadsheets</li>
2897 </ul>
2898 <p>There are a few other things you can do with <code>openpyxl</code> that might not have been covered in this tutorial, but you can always check the package&rsquo;s official <a href="https://openpyxl.readthedocs.io/en/stable/index.html">documentation website</a> to learn more about it. You can even venture into checking its <a href="https://bitbucket.org/openpyxl/openpyxl/src/default/">source code</a> and improving the package further.</p>
2899 <p>Feel free to leave any comments below if you have any questions, or if there&rsquo;s any section you&rsquo;d love to hear more about.</p>
2900 <div class="alert alert-warning" role="alert"><p><strong>Download Dataset:</strong> <a href="https://realpython.com/optins/view/openpyxl-sample-dataset/" class="alert-link" data-toggle="modal" data-target="#modal-openpyxl-sample-dataset" data-focus="false">Click here to download the dataset for the openpyxl exercise you'll be following in this tutorial.</a></p></div>
2901         <hr />
2902         <p><em>[ Improve Your Python With 🐍 Python Tricks πŸ’Œ – Get a short &amp; sweet Python Trick delivered to your inbox every couple of days. <a href="https://realpython.com/python-tricks/?utm_source=realpython&amp;utm_medium=rss&amp;utm_campaign=footer">&gt;&gt; Click here to learn more and see examples</a> ]</em></p>#
2903 datePublished: #Mon Aug 26 14:00:00 2019#
2904 dateUpdated: #Mon Aug 26 14:00:00 2019#
2905 # Person begin ####################
2906 name: #Real Python#
2907 # Person end ######################
2908 # Item end ########################
2909 # Item begin ######################
2910 id: #https://realpython.com/cpython-source-code-guide/#
2911 title: #Your Guide to the CPython Source Code#
2912 link: #https://realpython.com/cpython-source-code-guide/#
2913 description: #In this detailed Python tutorial, you&apos;ll explore the CPython source code. By following this step-by-step walkthrough, you&apos;ll take a deep dive into how the CPython compiler works and how your Python code gets executed.#
2914 content: #<p>Are there certain parts of Python that just seem magic? Like how are dictionaries so much faster than looping over a list to find an item. How does a generator remember the state of the variables each time it yields a value and why do you never have to allocate memory like other languages? It turns out, CPython, the most popular Python runtime is written in human-readable C and Python code. This tutorial will walk you through the CPython source code. </p>
2915 <p>You&rsquo;ll cover all the concepts behind the internals of CPython, how they work and visual explanations as you go.</p>
2916 <p><strong>You&rsquo;ll learn how to:</strong></p>
2917 <ul>
2918 <li>Read and navigate the source code</li>
2919 <li>Compile CPython from source code</li>
2920 <li>Navigate and comprehend the inner workings of concepts like lists, dictionaries, and generators</li>
2921 <li>Run the test suite</li>
2922 <li>Modify or upgrade components of the CPython library to contribute them to future versions</li>
2923 </ul>
2924 <p>Yes, this is a very long article. If you just made yourself a fresh cup of tea, coffee or your favorite beverage, it&rsquo;s going to be cold by the end of Part 1. </p>
2925 <p>This tutorial is split into five parts. Take your time for each part and make sure you try out the demos and the interactive components. You can feel a sense of achievement that you grasp the core concepts of Python that can make you a better Python programmer.</p>
2926 <div class="alert alert-warning" role="alert"><p><strong>Free Bonus:</strong> <a href="" class="alert-link" data-toggle="modal" data-target="#modal-python-mastery-course" data-focus="false">5 Thoughts On Python Mastery</a>, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.</p></div>
2927 
2928 <h2 h1="h1" id="part-1-introduction-to-cpython">Part 1: Introduction to CPython</h2>
2929 <p>When you type <code>python</code> at the console or install a Python distribution from <a href="https://www.python.org">python.org</a>, you are running <strong>CPython</strong>. CPython is one of the many Python runtimes, maintained and written by different teams of developers. Some other runtimes you may have heard are <a href="https://pypy.org/">PyPy</a>, <a href="https://cython.org/">Cython</a>, and <a href="https://www.jython.org/">Jython</a>.</p>
2930 <p>The unique thing about CPython is that it contains both a runtime and the shared language specification that all Python runtimes use. CPython is the &ldquo;official,&rdquo; or reference implementation of Python.</p>
2931 <p>The Python language specification is the document that the description of the Python language. For example, it says that <code>assert</code> is a reserved keyword, and that <code>[]</code> is used for indexing, slicing, and creating empty lists.</p>
2932 <p>Think about what you expect to be inside the Python distribution on your computer:</p>
2933 <ul>
2934 <li>When you type <code>python</code> without a file or module, it gives an interactive prompt.</li>
2935 <li>You can import built-in modules from the standard library like <code>json</code>.</li>
2936 <li>You can install packages from the internet using <code>pip</code>.</li>
2937 <li>You can test your applications using the built-in <code>unittest</code> library.</li>
2938 </ul>
2939 <p>These are all part of the CPython distribution. There&rsquo;s a lot more than just a compiler.</p>
2940 <div class="alert alert-primary" role="alert">
2941 <p><strong>Note:</strong> This article is written against version <a href="https://github.com/python/cpython/tree/v3.8.0b4">3.8.0b4</a> of the CPython source code.</p>
2942 </div>
2943 <h3 id="whats-in-the-source-code">What&rsquo;s in the Source Code?</h3>
2944 <p>The CPython source distribution comes with a whole range of tools, libraries, and components. We&rsquo;ll explore those in this article. First we are going to focus on the compiler.</p>
2945 <p>To download a copy of the CPython source code, you can use <code>git</code> to pull the latest version to a working copy locally:</p>
2946 <div class="highlight sh"><pre><span></span><span class="go">git clone https://github.com/python/cpython</span>
2947 <span class="go">cd cpython</span>
2948 <span class="go">git checkout v3.8.0b4</span>
2949 </pre></div>
2950 
2951 <div class="alert alert-primary" role="alert">
2952 <p><strong>Note:</strong> If you don&rsquo;t have Git available, you can download the source in a <a href="https://github.com/python/cpython/archive/v3.8.0b4.zip">ZIP</a> file directly from the GitHub website.</p>
2953 </div>
2954 <p>Inside of the newly downloaded <code>cpython</code> directory, you will find the following subdirectories:</p>
2955 <div class="highlight"><pre><span></span>cpython/
2956 β”‚
2957 β”œβ”€β”€ Doc      ← Source for the documentation
2958 β”œβ”€β”€ Grammar  ← The computer-readable language definition
2959 β”œβ”€β”€ Include  ← The C header files
2960 β”œβ”€β”€ Lib      ← Standard library modules written in Python
2961 β”œβ”€β”€ Mac      ← macOS support files
2962 β”œβ”€β”€ Misc     ← Miscellaneous files
2963 β”œβ”€β”€ Modules  ← Standard Library Modules written in C
2964 β”œβ”€β”€ Objects  ← Core types and the object model
2965 β”œβ”€β”€ Parser   ← The Python parser source code
2966 β”œβ”€β”€ PC       ← Windows build support files
2967 β”œβ”€β”€ PCbuild  ← Windows build support files for older Windows versions
2968 β”œβ”€β”€ Programs ← Source code for the python executable and other binaries
2969 β”œβ”€β”€ Python   ← The CPython interpreter source code
2970 └── Tools    ← Standalone tools useful for building or extending Python
2971 </pre></div>
2972 
2973 <p>Next, we&rsquo;ll compile CPython from the source code. This step requires a C compiler, and some build tools, which depend on the operating system you&rsquo;re using.</p>
2974 <h3 id="compiling-cpython-macos">Compiling CPython (macOS)</h3>
2975 <p>Compiling CPython on macOS is straightforward. You will first need the essential C compiler toolkit. The Command Line Development Tools is an app that you can update in macOS through the App Store. You need to perform the initial installation on the terminal.</p>
2976 <p>To open up a terminal in macOS, go to the Launchpad, then <em>Other</em> then choose the <em>Terminal</em> app. You will want to save this app to your Dock, so right-click the Icon and select <em>Keep in Dock</em>.</p>
2977 <p>Now, within the terminal, install the C compiler and toolkit by running the following:</p>
2978 <div class="highlight sh"><pre><span></span><span class="gp">$</span> xcode-select --install
2979 </pre></div>
2980 
2981 <p>This command will pop up with a prompt to download and install a set of tools, including Git, Make, and the GNU C compiler.</p>
2982 <p>You will also need a working copy of <a href="https://www.openssl.org/">OpenSSL</a> to use for fetching packages from the PyPi.org website. If you later plan on using this build to install additional packages, SSL validation is required.</p>
2983 <p>The simplest way to install OpenSSL on macOS is by using <a href="https://brew.sh">HomeBrew</a>. If you already have HomeBrew installed, you can install the dependencies for CPython with the <code>brew install</code> command:</p>
2984 <div class="highlight sh"><pre><span></span><span class="gp">$</span> brew install openssl xz zlib
2985 </pre></div>
2986 
2987 <p>Now that you have the dependencies, you can run the <code>configure</code> script, enabling SSL support by discovering the location that HomeBrew installed to and enabling the debug hooks <code>--with-pydebug</code>:</p>
2988 <div class="highlight sh"><pre><span></span><span class="gp">$</span> <span class="nv">CPPFLAGS</span><span class="o">=</span><span class="s2">&quot;-I</span><span class="k">$(</span>brew --prefix zlib<span class="k">)</span><span class="s2">/include&quot;</span> <span class="se">\</span>
2989  <span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">&quot;-L</span><span class="k">$(</span>brew --prefix zlib<span class="k">)</span><span class="s2">/lib&quot;</span> <span class="se">\</span>
2990  ./configure --with-openssl<span class="o">=</span><span class="k">$(</span>brew --prefix openssl<span class="k">)</span> --with-pydebug
2991 </pre></div>
2992 
2993 <p>This will generate a <code>Makefile</code> in the root of the repository that you can use to automate the build process. The <code>./configure</code> step only needs to be run once. You can build the CPython binary by running:</p>
2994 <div class="highlight sh"><pre><span></span><span class="gp">$</span> make -j2 -s
2995 </pre></div>
2996 
2997 <p>The <code>-j2</code> flag allows <code>make</code> to run 2 jobs simultaneously. If you have 4 cores, you can change this to 4. The <code>-s</code> flag stops the <code>Makefile</code> from printing every command it runs to the console. You can remove this, but the output is very verbose.</p>
2998 <p>During the build, you may receive some errors, and in the summary, it will notify you that not all packages could be built. For example, <code>_dbm</code>, <code>_sqlite3</code>, <code>_uuid</code>, <code>nis</code>, <code>ossaudiodev</code>, <code>spwd</code>, and <code>_tkinter</code> would fail to build with this set of instructions. That&rsquo;s okay if you aren&rsquo;t planning on developing against those packages. If you are, then check out the <a href="https://devguide.python.org/">dev guide</a> website for more information.</p>
2999 <p>The build will take a few minutes and generate a binary called <code>python.exe</code>.  Every time you make changes to the source code, you will need to re-run <code>make</code> with the same flags.
3000 The <code>python.exe</code> binary is the debug binary of CPython. Execute <code>python.exe</code> to see a working REPL:</p>
3001 <div class="highlight sh"><pre><span></span><span class="gp">$</span> ./python.exe
3002 <span class="go">Python 3.8.0b4 (tags/v3.8.0b4:d93605de72, Aug 30 2019, 10:00:03) </span>
3003 <span class="go">[Clang 10.0.1 (clang-1001.0.46.4)] on darwin</span>
3004 <span class="go">Type &quot;help&quot;, &quot;copyright&quot;, &quot;credits&quot; or &quot;license&quot; for more information.</span>
3005 <span class="gp">&gt;</span>&gt;&gt; 
3006 </pre></div>
3007 
3008 <div class="alert alert-primary" role="alert">
3009 <p><strong>Note:</strong> 
3010 Yes, that&rsquo;s right, the macOS build has a file extension for <code>.exe</code>. This is <em>not</em> because it&rsquo;s a Windows binary. Because macOS has a case-insensitive filesystem and when working with the binary, the developers didn&rsquo;t want people to accidentally refer to the directory <code>Python/</code> so <code>.exe</code> was appended to avoid ambiguity.
3011 If you later run <code>make install</code> or <code>make altinstall</code>, it will rename the file back to <code>python</code>.</p>
3012 </div>
3013 <h3 id="compiling-cpython-linux">Compiling CPython (Linux)</h3>
3014 <p>For Linux, the first step is to download and install <code>make</code>, <code>gcc</code>, <code>configure</code>, and <code>pkgconfig</code>. </p>
3015 <p>For Fedora Core, RHEL, CentOS, or other yum-based systems: </p>
3016 <div class="highlight sh"><pre><span></span><span class="gp">$</span> sudo yum install yum-utils
3017 </pre></div>
3018 
3019 <p>For Debian, Ubuntu, or other <code>apt</code>-based systems:</p>
3020 <div class="highlight sh"><pre><span></span><span class="gp">$</span> sudo apt install build-essential
3021 </pre></div>
3022 
3023 <p>Then install the required packages, for Fedora Core, RHEL, CentOS or other yum-based systems: </p>
3024 <div class="highlight sh"><pre><span></span><span class="gp">$</span> sudo yum-builddep python3
3025 </pre></div>
3026 
3027 <p>For Debian, Ubuntu, or other <code>apt</code>-based systems:</p>
3028 <div class="highlight sh"><pre><span></span><span class="gp">$</span> sudo apt install libssl-dev zlib1g-dev libncurses5-dev <span class="se">\</span>
3029   libncursesw5-dev libreadline-dev libsqlite3-dev libgdbm-dev <span class="se">\</span>
3030   libdb5.3-dev libbz2-dev libexpat1-dev liblzma-dev libffi-dev
3031 </pre></div>
3032 
3033 <p>Now that you have the dependencies, you can run the <code>configure</code> script, enabling the debug hooks <code>--with-pydebug</code>:</p>
3034 <div class="highlight sh"><pre><span></span><span class="gp">$</span> ./configure --with-pydebug
3035 </pre></div>
3036 
3037 <p>Review the output to ensure that OpenSSL support was marked as <code>YES</code>. Otherwise, check with your distribution for instructions on installing the headers for OpenSSL.</p>
3038 <p>Next, you can build the CPython binary by running the generated <code>Makefile</code>:</p>
3039 <div class="highlight sh"><pre><span></span><span class="gp">$</span> make -j2 -s
3040 </pre></div>
3041 
3042 <p>During the build, you may receive some errors, and in the summary, it will notify you that not all packages could be built. That&rsquo;s okay if you aren&rsquo;t planning on developing against those packages. If you are, then check out the <a href="https://devguide.python.org/">dev guide</a> website for more information.</p>
3043 <p>The build will take a few minutes and generate a binary called <code>python</code>. This is the debug binary of CPython. Execute <code>./python</code> to see a working REPL:</p>
3044 <div class="highlight sh"><pre><span></span><span class="gp">$</span> ./python
3045 <span class="go">Python 3.8.0b4 (tags/v3.8.0b4:d93605de72, Aug 30 2019, 10:00:03) </span>
3046 <span class="go">[Clang 10.0.1 (clang-1001.0.46.4)] on darwin</span>
3047 <span class="go">Type &quot;help&quot;, &quot;copyright&quot;, &quot;credits&quot; or &quot;license&quot; for more information.</span>
3048 <span class="gp">&gt;</span>&gt;&gt; 
3049 </pre></div>
3050 
3051 <h3 id="compiling-cpython-windows">Compiling CPython (Windows)</h3>
3052 <p>Inside the PC folder is a Visual Studio project file for building and exploring CPython. To use this, you need to have Visual Studio installed on your PC.</p>
3053 <p>The newest version of Visual Studio, Visual Studio 2019, makes it easier to work with Python and the CPython source code, so it is recommended for use in this tutorial. If you already have Visual Studio 2017 installed, that would also work fine.</p>
3054 <p>None of the paid features are required for compiling CPython or this tutorial. You can use the Community edition of Visual Studio, which is available for free from <a href="https://visualstudio.microsoft.com/vs/">Microsoft&rsquo;s Visual Studio website</a>.</p>
3055 <p>Once you&rsquo;ve downloaded the installer, you&rsquo;ll be asked to select which components you want to install. The bare minimum for this tutorial is:</p>
3056 <ul>
3057 <li>The <strong>Python Development</strong> workload</li>
3058 <li>The optional <strong>Python native development tools</strong></li>
3059 <li>Python 3 64-bit (3.7.2) (can be deselected if you already have Python 3.7 installed)</li>
3060 </ul>
3061 <p>Any other optional features can be deselected if you want to be more conscientious with disk space:</p>
3062 <p><a href="https://files.realpython.com/media/Screen_Shot_2019-08-22_at_2.47.23_pm.5e8682a89503.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/Screen_Shot_2019-08-22_at_2.47.23_pm.5e8682a89503.png" width="2504" height="1260" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-08-22_at_2.47.23_pm.5e8682a89503.png&amp;w=626&amp;sig=86eb9f82580a69f533983087ba0fa4faf0d5bf96 626w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-08-22_at_2.47.23_pm.5e8682a89503.png&amp;w=1252&amp;sig=fe2157486f81073eabe043b3c441af23dd67b78a 1252w, https://files.realpython.com/media/Screen_Shot_2019-08-22_at_2.47.23_pm.5e8682a89503.png 2504w" sizes="75vw" alt="Visual Studio Options Window"/></a></p>
3063 <p>The installer will then download and install all of the required components. The installation could take an hour, so you may want to read on and come back to this section.</p>
3064 <p>Once the installer has completed, click the <em>Launch</em> button to start Visual Studio. You will be prompted to sign in. If you have a Microsoft account you can log in, or skip that step.</p>
3065 <p>Once Visual Studio starts, you will be prompted to Open a Project. A shortcut to getting started with the Git configuration and cloning CPython is to choose the <em>Clone or check out code</em> option:</p>
3066 <p><a href="https://files.realpython.com/media/Capture3.e19765d74ec4.PNG" target="_blank"><img class="img-fluid mx-auto d-block w-50" src="https://files.realpython.com/media/Capture3.e19765d74ec4.PNG" width="2048" height="1420" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Capture3.e19765d74ec4.PNG&amp;w=512&amp;sig=e475e2a09cd780894f108850f91736c53ac95f27 512w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Capture3.e19765d74ec4.PNG&amp;w=1024&amp;sig=47d66a316c2546c9787e5af4cb4a9a006dca936d 1024w, https://files.realpython.com/media/Capture3.e19765d74ec4.PNG 2048w" sizes="75vw" alt="Choosing a Project Type in Visual Studio"/></a></p>
3067 <p>For the project URL, type <code>https://github.com/python/cpython</code> to clone:</p>
3068 <p><a href="https://files.realpython.com/media/Capture4.ea01418a971c.PNG" target="_blank"><img class="img-fluid mx-auto d-block w-50" src="https://files.realpython.com/media/Capture4.ea01418a971c.PNG" width="2048" height="1420" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Capture4.ea01418a971c.PNG&amp;w=512&amp;sig=47ed81b234652446a4f0e77e2a3f70e0074ac222 512w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Capture4.ea01418a971c.PNG&amp;w=1024&amp;sig=de393823fe557847f295d040573bd061b2ccd557 1024w, https://files.realpython.com/media/Capture4.ea01418a971c.PNG 2048w" sizes="75vw" alt="Cloning projects in Visual Studio"/></a></p>
3069 <p>Visual Studio will then download a copy of CPython from GitHub using the version of Git bundled with Visual Studio. This step also saves you the hassle of having to install Git on Windows. The download may take 10 minutes.</p>
3070 <p>Once the project has downloaded, you need to point it to the <strong><code>pcbuild</code></strong> Solution file, by clicking on <em>Solutions and Projects</em> and selecting <code>pcbuild.sln</code>:</p>
3071 <p><a href="https://files.realpython.com/media/Capture6.3d06a62b8e87.PNG" target="_blank"><img class="img-fluid mx-auto d-block border w-50" src="https://files.realpython.com/media/Capture6.3d06a62b8e87.PNG" width="863" height="565" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Capture6.3d06a62b8e87.PNG&amp;w=215&amp;sig=0d4c23758a58848ea65b74514aca9e15a72748fa 215w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Capture6.3d06a62b8e87.PNG&amp;w=431&amp;sig=05f0ebe121ef54abf51bcdb55fba695d28656868 431w, https://files.realpython.com/media/Capture6.3d06a62b8e87.PNG 863w" sizes="75vw" alt="Selecting a solution"/></a></p>
3072 <p>When the solution is loaded, it will prompt you to retarget the project&rsquo;s inside the solution to the version of the C/C++ compiler you have installed. Visual Studio will also target the version of the Windows SDK you have installed.</p>
3073 <p>Ensure that you change the Windows SDK version to the newest installed version and the platform toolset to the latest version. If you missed this window, you can right-click on the Solution in the <em>Solutions and Projects</em> window and click <em>Retarget Solution</em>.</p>
3074 <p>Once this is complete, you need to download some source files to be able to build the whole CPython package. Inside the <code>PCBuild</code> folder there is a <code>.bat</code> file that automates this for you. <a href="https://www.youtube.com/watch?v=bgSSJQolR0E">Open up a command-line prompt inside</a> the downloaded <code>PCBuild</code> and run <code>get_externals.bat</code>:</p>
3075 <div class="highlight sh"><pre><span></span><span class="go"> &gt; get_externals.bat</span>
3076 <span class="go">Using py -3.7 (found 3.7 with py.exe)</span>
3077 <span class="go">Fetching external libraries...</span>
3078 <span class="go">Fetching bzip2-1.0.6...</span>
3079 <span class="go">Fetching sqlite-3.21.0.0...</span>
3080 <span class="go">Fetching xz-5.2.2...</span>
3081 <span class="go">Fetching zlib-1.2.11...</span>
3082 <span class="go">Fetching external binaries...</span>
3083 <span class="go">Fetching openssl-bin-1.1.0j...</span>
3084 <span class="go">Fetching tcltk-8.6.9.0...</span>
3085 <span class="go">Finished.</span>
3086 </pre></div>
3087 
3088 <p>Next, back within Visual Studio, build CPython by pressing <span class="keys"><kbd class="key-control">Ctrl</kbd><span>+</span><kbd class="key-shift">Shift</kbd><span>+</span><kbd class="key-b">B</kbd></span>, or choosing <em>Build Solution</em> from the top menu. If you receive any errors about the Windows SDK being missing, make sure you set the right targeting settings in the <em>Retarget Solution</em> window. You should also see <em>Windows Kits</em> inside your Start Menu, and <em>Windows Software Development Kit</em> inside of that menu.</p>
3089 <p>The build stage could take 10 minutes or more for the first time. Once the build is completed, you may see a few warnings that you can ignore and eventual completion.</p>
3090 <p>To start the debug version of CPython, press <span class="keys"><kbd class="key-f5">F5</kbd></span> and CPython will start in Debug mode straight into the REPL:</p>
3091 <p><a href="https://files.realpython.com/media/Capture8.967a3606daf0.PNG" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Capture8.967a3606daf0.PNG" width="3360" height="2100" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Capture8.967a3606daf0.PNG&amp;w=840&amp;sig=1822fde8ffe6946fc91e47dd8975aa23add8b23a 840w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Capture8.967a3606daf0.PNG&amp;w=1680&amp;sig=a093e68320ad548384c585840002e4294ab4bf94 1680w, https://files.realpython.com/media/Capture8.967a3606daf0.PNG 3360w" sizes="75vw" alt="CPython debugging Windows"/></a></p>
3092 <p>Once this is completed, you can run the Release build by changing the build configuration from <em>Debug</em> to <em>Release</em> on the top menu bar and rerunning Build Solution again.
3093 You now have both Debug and Release versions of the CPython binary within <code>PCBuild\win32\</code>.</p>
3094 <p>You can set up Visual Studio to be able to open a REPL with either the Release or Debug build by choosing <em><code>Tools</code>-&gt;<code>Python</code>-&gt;<code>Python Environments</code></em> from the top menu:</p>
3095 <p><a href="https://files.realpython.com/media/Environments.96a819ecf0b3.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/Environments.96a819ecf0b3.png" width="3360" height="2033" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Environments.96a819ecf0b3.png&amp;w=840&amp;sig=f8dd9b3b31d44c25cbe06d56078b0462cb0fa753 840w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Environments.96a819ecf0b3.png&amp;w=1680&amp;sig=e8ca69be87363b62ccda44bcf53bedd4e4e320c2 1680w, https://files.realpython.com/media/Environments.96a819ecf0b3.png 3360w" sizes="75vw" alt="Choosing Python environments"/></a></p>
3096 <p>Then click <em>Add Environment</em> and then target the Debug or Release binary. The Debug binary will end in <code>_d.exe</code>, for example, <code>python_d.exe</code> and <code>pythonw_d.exe</code>. You will most likely want to use the debug binary as it comes with Debugging support in Visual Studio and will be useful for this tutorial.</p>
3097 <p>In the Add Environment window, target the <code>python_d.exe</code> file as the interpreter inside the <code>PCBuild/win32</code> and the <code>pythonw_d.exe</code> as the windowed interpreter:</p>
3098 <p><a href="https://files.realpython.com/media/environment3.d33858c1f6aa.PNG" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/environment3.d33858c1f6aa.PNG" width="2048" height="1352" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/environment3.d33858c1f6aa.PNG&amp;w=512&amp;sig=ffc6b359ac60d689f40f98233466cef35354d239 512w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/environment3.d33858c1f6aa.PNG&amp;w=1024&amp;sig=d881ce7ecabc62624fb2a5154102a5be2ff6a4db 1024w, https://files.realpython.com/media/environment3.d33858c1f6aa.PNG 2048w" sizes="75vw" alt="Adding an environment in VS2019"/></a></p>
3099 <p>Now, you can start a REPL session by clicking <em>Open Interactive Window</em> in the Python Environments window and you will see the REPL for the compiled version of Python:</p>
3100 <p><a href="https://files.realpython.com/media/environment4.7c9eade3b74e.PNG" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/environment4.7c9eade3b74e.PNG" width="3360" height="2033" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/environment4.7c9eade3b74e.PNG&amp;w=840&amp;sig=9e384e72bcfdebb39fe6dc23f21c239a0895ad51 840w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/environment4.7c9eade3b74e.PNG&amp;w=1680&amp;sig=be56aac5b9baffc9b8b0de4701527954ed32ef53 1680w, https://files.realpython.com/media/environment4.7c9eade3b74e.PNG 3360w" sizes="75vw" alt="Python Environment REPL"/></a></p>
3101 <p>During this tutorial there will be REPL sessions with example commands. I encourage you to use the Debug binary to run these REPL sessions in case you want to put in any breakpoints within the code.</p>
3102 <p>Lastly, to make it easier to navigate the code, in the Solution View, click on the toggle button next to the Home icon to switch to Folder view:</p>
3103 <p><a href="https://files.realpython.com/media/environments5.6462694398e3.6fb872a5f57d.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/environments5.6462694398e3.6fb872a5f57d.png" width="1231" height="692" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/environments5.6462694398e3.6fb872a5f57d.png&amp;w=307&amp;sig=14bf2b50bd86dfabedc3ee1ec75a60ebccd574c4 307w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/environments5.6462694398e3.6fb872a5f57d.png&amp;w=615&amp;sig=0d5644853d8c429ccc78fc1bced16d8f571f555d 615w, https://files.realpython.com/media/environments5.6462694398e3.6fb872a5f57d.png 1231w" sizes="75vw" alt="Switching Environment Mode"/></a></p>
3104 <p>Now you have a version of CPython compiled and ready to go, let&rsquo;s find out how the CPython compiler works.</p>
3105 <h3 id="what-does-a-compiler-do">What Does a Compiler Do?</h3>
3106 <p>The purpose of a compiler is to convert one language into another. Think of a compiler like a translator. You would hire a translator to listen to you speaking in English and then speak in Japanese:</p>
3107 <p><a href="https://files.realpython.com/media/t.38be306a7e83.png" target="_blank"><img class="img-fluid mx-auto d-block w-75" src="https://files.realpython.com/media/t.38be306a7e83.png" width="960" height="540" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/t.38be306a7e83.png&amp;w=240&amp;sig=2ad6eec49af1eaba79b83c099925e01a970d5efd 240w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/t.38be306a7e83.png&amp;w=480&amp;sig=033587107c8dceca7bcc292527a4c815fffb0b8d 480w, https://files.realpython.com/media/t.38be306a7e83.png 960w" sizes="75vw" alt="Translating from English to Japanese"/></a></p>
3108 <p>Some compilers will compile into a low-level machine code which can be executed directly on a system. Other compilers will compile into an intermediary language, to be executed by a virtual machine.</p>
3109 <p>One important decision to make when choosing a compiler is the system portability requirements. <a href="https://en.wikipedia.org/wiki/Java_bytecode">Java</a> and <a href="https://en.wikipedia.org/wiki/Common_Language_Runtime">.NET CLR</a> will compile into an Intermediary Language so that the compiled code is portable across multiple systems architectures. C, Go, C++, and Pascal will compile into a low-level executable that will only work on systems similar to the one it was compiled. </p>
3110 <p>Because Python applications are typically distributed as source code, the role of the Python runtime is to convert the Python source code and execute it in one step. Internally, the CPython runtime does compile your code. A popular misconception is that Python is an interpreted language. It is actually compiled.</p>
3111 <p>Python code is not compiled into machine-code. It is compiled into a special low-level intermediary language called <strong>bytecode</strong> that only CPython understands. This code is stored in <code>.pyc</code> files in a hidden directory and cached for execution. If you run the same Python application twice without changing the source code, it&rsquo;ll always be much faster the second time. This is because it loads the compiled bytecode and executes it directly.</p>
3112 <h3 id="why-is-cpython-written-in-c-and-not-python">Why Is CPython Written in C and Not Python?</h3>
3113 <p>The <strong>C</strong> in CPython is a reference to the C programming language, implying that this Python distribution is written in the C language.</p>
3114 <p>This statement is largely true: the compiler in CPython is written in pure C. However, many of the standard library modules are written in pure Python or a combination of C and Python.</p>
3115 <p><strong>So why is CPython written in C and not Python?</strong></p>
3116 <p>The answer is located in how compilers work. There are two types of compiler:</p>
3117 <ol>
3118 <li><strong><a href="https://en.wikipedia.org/wiki/Self-hosting">Self-hosted compilers</a></strong> are compilers written in the language they compile, such as the Go compiler.</li>
3119 <li><strong><a href="https://en.wikipedia.org/wiki/Source-to-source_compiler">Source-to-source compilers</a></strong> are compilers written in another language that already have a compiler.</li>
3120 </ol>
3121 <p>If you&rsquo;re writing a new programming language from scratch, you need an executable application to compile your compiler! You need a compiler to execute anything, so when new languages are developed, they&rsquo;re often written first in an older, more established language.</p>
3122 <p>A good example would be the Go programming language. The first Go compiler was written in C, then once Go could be compiled, the compiler was rewritten in Go. </p>
3123 <p>CPython kept its C heritage: many of the standard library modules, like the <code>ssl</code> module or the <code>sockets</code> module, are written in C to access low-level operating system APIs.
3124 The APIs in the Windows and Linux kernels for <a href="https://realpython.com/python-sockets/">creating network sockets</a>, <a href="https://realpython.com/working-with-files-in-python/">working with the filesystem</a> or <a href="https://realpython.com/python-gui-with-wxpython/">interacting with the display</a> are all written in C. It made sense for Python&rsquo;s extensibility layer to be focused on the C language. Later in this article, we will cover the Python Standard Library and the C modules.</p>
3125 <p>There is a Python compiler written in Python called <a href="https://pypy.org/">PyPy</a>. PyPy&rsquo;s logo is an <a href="https://en.wikipedia.org/wiki/Ouroboros">Ouroboros</a> to represent the self-hosting nature of the compiler.</p>
3126 <p>Another example of a cross-compiler for Python is <a href="https://www.jython.org/">Jython</a>. Jython is written in Java and compiles from Python source code into Java bytecode. In the same way that CPython makes it easy to import C libraries and use them from Python, Jython makes it easy to import and reference Java modules and classes.</p>
3127 <h3 id="the-python-language-specification">The Python Language Specification</h3>
3128 <p>Contained within the CPython source code is the definition of the Python language. This is the reference specification used by all the Python interpreters.</p>
3129 <p>The specification is in both human-readable and machine-readable format. Inside the documentation is a detailed explanation of the Python language, what is allowed, and how each statement should behave.</p>
3130 <h4 id="documentation">Documentation</h4>
3131 <p>Located inside the <code>Doc/reference</code> directory are <a href="http://docutils.sourceforge.net/rst.html">reStructuredText</a> explanations of each of the features in the Python language. This forms the official Python reference guide on <a href="https://docs.python.org/3/reference/">docs.python.org</a>.</p>
3132 <p>Inside the directory are the files you need to understand the whole language, structure, and keywords:</p>
3133 <div class="highlight"><pre><span></span>cpython/Doc/reference
3134 |
3135 β”œβ”€β”€ compound_stmts.rst
3136 β”œβ”€β”€ datamodel.rst
3137 β”œβ”€β”€ executionmodel.rst
3138 β”œβ”€β”€ expressions.rst
3139 β”œβ”€β”€ grammar.rst
3140 β”œβ”€β”€ import.rst
3141 β”œβ”€β”€ index.rst
3142 β”œβ”€β”€ introduction.rst
3143 β”œβ”€β”€ lexical_analysis.rst
3144 β”œβ”€β”€ simple_stmts.rst
3145 └── toplevel_components.rst
3146 </pre></div>
3147 
3148 <p>Inside <code>compound_stmts.rst</code>, the documentation for compound statements, you can see a simple example defining the <code>with</code> statement.</p>
3149 <p>The <code>with</code> statement can be used in multiple ways in Python, the simplest being the <a href="https://dbader.org/blog/python-context-managers-and-with-statement">instantiation of a context-manager</a> and a nested block of code:</p>
3150 <div class="highlight python"><pre><span></span><span class="k">with</span> <span class="n">x</span><span class="p">():</span>
3151    <span class="o">...</span>
3152 </pre></div>
3153 
3154 <p>You can assign the result to a variable using the <code>as</code> keyword:</p>
3155 <div class="highlight python"><pre><span></span><span class="k">with</span> <span class="n">x</span><span class="p">()</span> <span class="k">as</span> <span class="n">y</span><span class="p">:</span>
3156    <span class="o">...</span>
3157 </pre></div>
3158 
3159 <p>You can also chain context managers together with a comma:</p>
3160 <div class="highlight python"><pre><span></span><span class="k">with</span> <span class="n">x</span><span class="p">()</span> <span class="k">as</span> <span class="n">y</span><span class="p">,</span> <span class="n">z</span><span class="p">()</span> <span class="k">as</span> <span class="n">jk</span><span class="p">:</span>
3161    <span class="o">...</span>
3162 </pre></div>
3163 
3164 <p>Next, we&rsquo;ll explore the computer-readable documentation of the Python language.</p>
3165 <h4 id="grammar">Grammar</h4>
3166 <p>The documentation contains the human-readable specification of the language, and the machine-readable specification is housed in a single file, <a href="https://github.com/python/cpython/blob/master/Grammar/Grammar"><code>Grammar/Grammar</code></a>. </p>
3167 <p>The Grammar file is written in a context-notation called <a href="https://en.m.wikipedia.org/wiki/Backus%E2%80%93Naur_form">Backus-Naur Form (BNF)</a>. BNF is not specific to Python and is often used as the notation for grammars in many other languages.</p>
3168 <p>The concept of grammatical structure in a programming language is inspired by <a href="https://en.wikipedia.org/wiki/Syntactic_Structures">Noam Chomsky&rsquo;s work on Syntactic Structures</a> in the 1950s!</p>
3169 <p>Python&rsquo;s grammar file uses the Extended-BNF (EBNF) specification with regular-expression syntax. So, in the grammar file you can use:</p>
3170 <ul>
3171 <li><strong><code>*</code></strong> for repetition</li>
3172 <li><strong><code>+</code></strong> for at-least-once repetition</li>
3173 <li><strong><code>[]</code></strong> for optional parts</li>
3174 <li><strong><code>|</code></strong> for alternatives</li>
3175 <li><strong><code>()</code></strong> for grouping</li>
3176 </ul>
3177 <p>If you search for the <code>with</code> statement in the grammar file, at around line 80 you&rsquo;ll see the definitions for the <code>with</code> statement:</p>
3178 <div class="highlight text"><pre><span></span>with_stmt: &#39;with&#39; with_item (&#39;,&#39; with_item)*  &#39;:&#39; suite
3179 with_item: test [&#39;as&#39; expr]
3180 </pre></div>
3181 
3182 <p>Anything in quotes is a string literal, which is how keywords are defined. So the <code>with_stmt</code> is specified as:</p>
3183 <ol>
3184 <li>Starting with the word <code>with</code></li>
3185 <li>Followed by a <code>with_item</code>, which is a <code>test</code> and (optionally), the word <code>as</code>, and an expression</li>
3186 <li>Following one or many items, each separated by a comma</li>
3187 <li>Ending with a <code>:</code></li>
3188 <li>Followed by a <code>suite</code></li>
3189 </ol>
3190 <p>There are references to some other definitions in these two lines:</p>
3191 <ul>
3192 <li><strong><code>suite</code></strong> refers to a block of code with one or multiple statements</li>
3193 <li><strong><code>test</code></strong> refers to a simple statement that is evaluated</li>
3194 <li><strong><code>expr</code></strong> refers to a simple expression</li>
3195 </ul>
3196 <p>If you want to explore those in detail, the whole of the Python grammar is defined in this single file.</p>
3197 <p>If you want to see a recent example of how grammar is used, in PEP 572 the <strong>colon equals</strong> operator was added to the grammar file in <a href="https://github.com/python/cpython/commit/8f59ee01be3d83d5513a9a3f654a237d77d80d9a#diff-cb0b9d6312c0d67f6d4aa1966766cedd">this Git commit</a>.</p>
3198 <h4 id="using-pgen">Using <code>pgen</code></h4>
3199 <p>The grammar file itself is never used by the Python compiler. Instead, a parser table created by a tool called <code>pgen</code> is used. <code>pgen</code> reads the grammar file and converts it into a parser table. If you make changes to the grammar file, you must regenerate the parser table and recompile Python.</p>
3200 <div class="alert alert-primary" role="alert">
3201 <p><strong>Note:</strong> The <code>pgen</code> application was rewritten in Python 3.8 from C to <a href="https://github.com/python/cpython/blob/master/Parser/pgen/pgen.py">pure Python</a>.</p>
3202 </div>
3203 <p>To see <code>pgen</code> in action, let&rsquo;s change part of the Python grammar. Around line 51 you will see the definition of a <code>pass</code> statement:</p>
3204 <div class="highlight text"><pre><span></span>pass_stmt: &#39;pass&#39;
3205 </pre></div>
3206 
3207 <p>Change that line to accept the keyword <code>'pass'</code> or <code>'proceed'</code> as keywords:</p>
3208 <div class="highlight text"><pre><span></span>pass_stmt: &#39;pass&#39; | &#39;proceed&#39;
3209 </pre></div>
3210 
3211 <p>Now you need to rebuild the grammar files.
3212 On macOS and Linux, run <code>make regen-grammar</code> to run <code>pgen</code> over the altered grammar file. For Windows, there is no officially supported way of running <code>pgen</code>. However, you can clone <a href="https://github.com/tonybaloney/cpython/tree/pcbuildregen">my fork</a> and run <code>build.bat --regen</code> from within the <code>PCBuild</code> directory.</p>
3213 <p>You should see an output similar to this, showing that the new <code>Include/graminit.h</code> and <code>Python/graminit.c</code> files have been generated:</p>
3214 <div class="highlight text"><pre><span></span># Regenerate Doc/library/token-list.inc from Grammar/Tokens
3215 # using Tools/scripts/generate_token.py
3216 ...
3217 python3 ./Tools/scripts/update_file.py ./Include/graminit.h ./Include/graminit.h.new
3218 python3 ./Tools/scripts/update_file.py ./Python/graminit.c ./Python/graminit.c.new
3219 </pre></div>
3220 
3221 <div class="alert alert-primary" role="alert">
3222 <p><strong>Note:</strong> <code>pgen</code> works by converting the EBNF statements into a <a href="https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton">Non-deterministic Finite Automaton (NFA)</a>, which is then turned into a <a href="https://en.wikipedia.org/wiki/Deterministic_finite_automaton">Deterministic Finite Automaton (DFA)</a>.
3223 The DFAs are used by the parser as parsing tables in a special way that&rsquo;s unique to CPython. This technique was <a href="http://infolab.stanford.edu/~ullman/dragon/slides1.pdf">formed at Stanford University</a> and developed in the 1980s, just before the advent of Python.</p>
3224 </div>
3225 <p>With the regenerated parser tables, you need to recompile CPython to see the new syntax. Use the same compilation steps you used earlier for your operating system.</p>
3226 <p>If the code compiled successfully, you can execute your new CPython binary and start a REPL.</p>
3227 <p>In the REPL, you can now try defining a function and instead of using the <code>pass</code> statement, use the <code>proceed</code> keyword alternative that you compiled into the Python grammar:</p>
3228 <div class="highlight text"><pre><span></span>Python 3.8.0b4 (tags/v3.8.0b4:d93605de72, Aug 30 2019, 10:00:03) 
3229 [Clang 10.0.1 (clang-1001.0.46.4)] on darwin
3230 Type &quot;help&quot;, &quot;copyright&quot;, &quot;credits&quot; or &quot;license&quot; for more information.
3231 &gt;&gt;&gt; def example():
3232 ...    proceed
3233 ... 
3234 &gt;&gt;&gt; example()
3235 </pre></div>
3236 
3237 <p>Well done! You&rsquo;ve changed the CPython syntax and compiled your own version of CPython. Ship it!</p>
3238 <p>Next, we&rsquo;ll explore tokens and their relationship to grammar.</p>
3239 <h4 id="tokens">Tokens</h4>
3240 <p>Alongside the grammar file in the <code>Grammar</code> folder is a <a href="https://github.com/python/cpython/blob/master/Grammar/Tokens"><code>Tokens</code></a> file, which contains each of the unique types found as a leaf node in a parse tree. We will cover parser trees in depth later.
3241 Each token also has a name and a generated unique ID. The names are used to make it simpler to refer to in the tokenizer.</p>
3242 <div class="alert alert-primary" role="alert">
3243 <p><strong>Note:</strong> The <code>Tokens</code> file is a new feature in Python 3.8.</p>
3244 </div>
3245 <p>For example, the left parenthesis is called <code>LPAR</code>, and semicolons are called <code>SEMI</code>. You&rsquo;ll see these tokens later in the article:</p>
3246 <div class="highlight text"><pre><span></span>LPAR                    &#39;(&#39;
3247 RPAR                    &#39;)&#39;
3248 LSQB                    &#39;[&#39;
3249 RSQB                    &#39;]&#39;
3250 COLON                   &#39;:&#39;
3251 COMMA                   &#39;,&#39;
3252 SEMI                    &#39;;&#39;
3253 </pre></div>
3254 
3255 <p>As with the <code>Grammar</code> file, if you change the <code>Tokens</code> file, you need to run <code>pgen</code> again. </p>
3256 <p>To see tokens in action, you can use the <code>tokenize</code> module in CPython. Create a simple Python script called <code>test_tokens.py</code>:</p>
3257 <div class="highlight python"><pre><span></span><span class="c1"># Hello world!</span>
3258 <span class="k">def</span> <span class="nf">my_function</span><span class="p">():</span>
3259    <span class="n">proceed</span>
3260 </pre></div>
3261 
3262 <div class="alert alert-primary" role="alert">
3263 <p>For the rest of this tutorial, <code>./python.exe</code> will refer to the compiled version of CPython. However, the actual command will depend on your system.</p>
3264 <p>For Windows:</p>
3265 <div class="highlight sh"><pre><span></span><span class="go"> &gt; python.exe</span>
3266 </pre></div>
3267 
3268 <p>For Linux:</p>
3269 <div class="highlight sh"><pre><span></span><span class="go"> &gt; ./python</span>
3270 </pre></div>
3271 
3272 <p>For macOS:</p>
3273 <div class="highlight sh"><pre><span></span><span class="go"> &gt; ./python.exe</span>
3274 </pre></div>
3275 
3276 </div>
3277 <p>Then pass this file through a module built into the standard library called <code>tokenize</code>. You will see the list of tokens, by line and character. Use the <code>-e</code> flag to output the exact token name:</p>
3278 <div class="highlight sh"><pre><span></span><span class="gp">$</span> ./python.exe -m tokenize -e test_tokens.py
3279 
3280 <span class="go">0,0-0,0:            ENCODING       &#39;utf-8&#39;        </span>
3281 <span class="go">1,0-1,14:           COMMENT        &#39;# Hello world!&#39;</span>
3282 <span class="go">1,14-1,15:          NL             &#39;\n&#39;           </span>
3283 <span class="go">2,0-2,3:            NAME           &#39;def&#39;          </span>
3284 <span class="go">2,4-2,15:           NAME           &#39;my_function&#39;  </span>
3285 <span class="go">2,15-2,16:          LPAR           &#39;(&#39;            </span>
3286 <span class="go">2,16-2,17:          RPAR           &#39;)&#39;            </span>
3287 <span class="go">2,17-2,18:          COLON          &#39;:&#39;            </span>
3288 <span class="go">2,18-2,19:          NEWLINE        &#39;\n&#39;           </span>
3289 <span class="go">3,0-3,3:            INDENT         &#39;   &#39;          </span>
3290 <span class="go">3,3-3,7:            NAME           &#39;proceed&#39;         </span>
3291 <span class="go">3,7-3,8:            NEWLINE        &#39;\n&#39;           </span>
3292 <span class="go">4,0-4,0:            DEDENT         &#39;&#39;             </span>
3293 <span class="go">4,0-4,0:            ENDMARKER      &#39;&#39;              </span>
3294 </pre></div>
3295 
3296 <p>In the output, the first column is the range of the line/column coordinates, the second column is the name of the token, and the final column is the value of the token.</p>
3297 <p>In the output, the <code>tokenize</code> module has implied some tokens that were not in the file. The <code>ENCODING</code> token for <code>utf-8</code>, and a blank line at the end, giving <code>DEDENT</code> to close the function declaration and an <code>ENDMARKER</code> to end the file.</p>
3298 <p>It is best practice to have a blank line at the end of your Python source files. If you omit it, CPython adds it for you, with a tiny performance penalty.</p>
3299 <p>The <code>tokenize</code> module is written in pure Python and is located in <a href="https://github.com/python/cpython/blob/master/Lib/tokenize.py"><code>Lib/tokenize.py</code></a> within the CPython source code.</p>
3300 <div class="alert alert-primary" role="alert">
3301 <p><strong>Important:</strong> There are two tokenizers in the CPython source code: one written in Python, demonstrated here, and another written in C.
3302 The tokenizer written in Python is meant as a utility, and the one written in C is used by the Python compiler. They have identical output and behavior. The version written in C is designed for performance and the module in Python is designed for debugging.</p>
3303 </div>
3304 <p>To see a verbose readout of the C tokenizer, you can run Python with the <code>-d</code> flag. Using the <code>test_tokens.py</code> script you created earlier, run it with the following:</p>
3305 <div class="highlight sh"><pre><span></span><span class="gp">$</span> ./python.exe -d test_tokens.py
3306 
3307 <span class="go">Token NAME/&#39;def&#39; ... It&#39;s a keyword</span>
3308 <span class="go"> DFA &#39;file_input&#39;, state 0: Push &#39;stmt&#39;</span>
3309 <span class="go"> DFA &#39;stmt&#39;, state 0: Push &#39;compound_stmt&#39;</span>
3310 <span class="go"> DFA &#39;compound_stmt&#39;, state 0: Push &#39;funcdef&#39;</span>
3311 <span class="go"> DFA &#39;funcdef&#39;, state 0: Shift.</span>
3312 <span class="go">Token NAME/&#39;my_function&#39; ... It&#39;s a token we know</span>
3313 <span class="go"> DFA &#39;funcdef&#39;, state 1: Shift.</span>
3314 <span class="go">Token LPAR/&#39;(&#39; ... It&#39;s a token we know</span>
3315 <span class="go"> DFA &#39;funcdef&#39;, state 2: Push &#39;parameters&#39;</span>
3316 <span class="go"> DFA &#39;parameters&#39;, state 0: Shift.</span>
3317 <span class="go">Token RPAR/&#39;)&#39; ... It&#39;s a token we know</span>
3318 <span class="go"> DFA &#39;parameters&#39;, state 1: Shift.</span>
3319 <span class="go">  DFA &#39;parameters&#39;, state 2: Direct pop.</span>
3320 <span class="go">Token COLON/&#39;:&#39; ... It&#39;s a token we know</span>
3321 <span class="go"> DFA &#39;funcdef&#39;, state 3: Shift.</span>
3322 <span class="go">Token NEWLINE/&#39;&#39; ... It&#39;s a token we know</span>
3323 <span class="go"> DFA &#39;funcdef&#39;, state 5: [switch func_body_suite to suite] Push &#39;suite&#39;</span>
3324 <span class="go"> DFA &#39;suite&#39;, state 0: Shift.</span>
3325 <span class="go">Token INDENT/&#39;&#39; ... It&#39;s a token we know</span>
3326 <span class="go"> DFA &#39;suite&#39;, state 1: Shift.</span>
3327 <span class="hll"><span class="go">Token NAME/&#39;proceed&#39; ... It&#39;s a keyword</span>
3328 </span><span class="go"> DFA &#39;suite&#39;, state 3: Push &#39;stmt&#39;</span>
3329 <span class="go">...</span>
3330 <span class="go">  ACCEPT.</span>
3331 </pre></div>
3332 
3333 <p>In the output, you can see that it highlighted <code>proceed</code> as a keyword. In the next chapter, we&rsquo;ll see how executing the Python binary gets to the tokenizer and what happens from there to execute your code.</p>
3334 <p>Now that you have an overview of the Python grammar and the relationship between tokens and statements, there is a way to convert the <code>pgen</code> output into an interactive graph.</p>
3335 <p>Here is a screenshot of the Python 3.8a2 grammar:</p>
3336 <p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-12_at_2.31.16_pm.f36c3e99b8b4.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/Screen_Shot_2019-03-12_at_2.31.16_pm.f36c3e99b8b4.png" width="3258" height="2248" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-12_at_2.31.16_pm.f36c3e99b8b4.png&amp;w=814&amp;sig=21666b4228a46a6bcc7aeca6d5263e62a3aeb6d5 814w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-12_at_2.31.16_pm.f36c3e99b8b4.png&amp;w=1629&amp;sig=01404157eddc09548f8cf7d4e995da9806c36fac 1629w, https://files.realpython.com/media/Screen_Shot_2019-03-12_at_2.31.16_pm.f36c3e99b8b4.png 3258w" sizes="75vw" alt="Python 3.8 DFA node graph"/></a></p>
3337 <p>The Python package used to generate this graph, <code>instaviz</code>, will be covered in a later chapter.</p>
3338 <h3 id="memory-management-in-cpython">Memory Management in CPython</h3>
3339 <p>Throughout this article, you will see references to a <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pyarena.c#L128"><code>PyArena</code></a> object. The arena is one of CPython&rsquo;s memory management structures. The code is within <code>Python/pyarena.c</code> and contains a wrapper around C&rsquo;s memory allocation and deallocation functions.</p>
3340 <p>In a traditionally written C program, the developer <em>should</em> allocate memory for data structures before writing into that data. This allocation marks the memory as belonging to the process with the operating system.</p>
3341 <p>It is also up to the developer to deallocate, or &ldquo;free,&rdquo; the allocated memory when its no longer being used and return it to the operating system&rsquo;s block table of free memory. 
3342 If a process allocates memory for a variable, say within a function or loop, when that function has completed, the memory is not automatically given back to the operating system in C. So if it hasn&rsquo;t been explicitly deallocated in the C code, it causes a memory leak. The process will continue to take more memory each time that function runs until eventually, the system runs out of memory, and crashes!</p>
3343 <p>Python takes that responsibility away from the programmer and uses two algorithms: <a href="https://realpython.com/python-memory-management/">a reference counter and a garbage collector</a>.</p>
3344 <p>Whenever an interpreter is instantiated, a <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pyarena.c#L128"><code>PyArena</code></a> is created and attached one of the fields in the interpreter. During the lifecycle of a CPython interpreter, many arenas could be allocated. They are connected with a linked list. The arena stores a list of pointers to Python Objects as a <code>PyListObject</code>. Whenever a new Python object is created, a pointer to it is added using <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pyarena.c#L203"><code>PyArena_AddPyObject()</code></a>. This function call stores a pointer in the arena&rsquo;s list, <code>a_objects</code>.</p>
3345 <div class="alert alert-primary" role="alert">
3346 <p>Even though Python doesn&rsquo;t have pointers, there are some <a href="https://realpython.com/pointers-in-python/">interesting techniques</a> to simulate the behavior of pointers.</p>
3347 </div>
3348 <p>The <code>PyArena</code> serves a second function, which is to allocate and reference a list of raw memory blocks. For example, a <code>PyList</code> would need extra memory if you added thousands of additional values. The <code>PyList</code> object&rsquo;s C code does not allocate memory directly. The object gets raw blocks of memory from the <code>PyArena</code> by calling <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pyarena.c#L180"><code>PyArena_Malloc()</code></a> from the <code>PyObject</code> with the required memory size. This task is completed by another abstraction in <code>Objects/obmalloc.c</code>. In the object allocation module, memory can be allocated, freed, and reallocated for a Python Object.</p>
3349 <p>A linked list of allocated blocks is stored inside the arena, so that when an interpreter is stopped, all managed memory blocks can be deallocated in one go using <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pyarena.c#L157"><code>PyArena_Free()</code></a>.</p>
3350 <p>Take the <code>PyListObject</code> example. If you were to <code>.append()</code> an object to the end of a Python list, you don&rsquo;t need to reallocate the memory used in the existing list beforehand. The <code>.append()</code> method calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/listobject.c#L36"><code>list_resize()</code></a> which handles memory allocation for lists. Each list object keeps a list of the amount of memory allocated. If the item you&rsquo;re appending will fit inside the existing free memory, it is simply added. If the list needs more memory space, it is expanded. Lists are expanded in length as 0, 4, 8, 16, 25, 35, 46, 58, 72, 88.</p>
3351 <p><a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/obmalloc.c#L618"><code>PyMem_Realloc()</code></a> is called to expand the memory allocated in a list. <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/obmalloc.c#L618"><code>PyMem_Realloc()</code></a> is an API wrapper for <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/obmalloc.c#L1913"><code>pymalloc_realloc()</code></a>.</p>
3352 <p>Python also has a special wrapper for the C call <code>malloc()</code>, which sets the max size of the memory allocation to help prevent buffer overflow errors (See <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Modules/overlapped.c#L28"><code>PyMem_RawMalloc()</code></a>).</p>
3353 <p>In summary: </p>
3354 <ul>
3355 <li>Allocation of raw memory blocks is done via <code>PyMem_RawAlloc()</code>.</li>
3356 <li>The pointers to Python objects are stored within the <code>PyArena</code>.</li>
3357 <li><code>PyArena</code> also stores a linked-list of allocated memory blocks.</li>
3358 </ul>
3359 <p>More information on the API is detailed on the <a href="https://docs.python.org/3/c-api/memory.html">CPython documentation</a>.</p>
3360 <h4 id="reference-counting">Reference Counting</h4>
3361 <p>To create a variable in Python, you have to assign a value to a <em>uniquely</em> named variable:</p>
3362 <div class="highlight python"><pre><span></span><span class="n">my_variable</span> <span class="o">=</span> <span class="mi">180392</span>
3363 </pre></div>
3364 
3365 <p>Whenever a value is assigned to a variable in Python, the name of the variable is checked within the locals and globals scope to see if it already exists.</p>
3366 <p>Because <code>my_variable</code> is not already within the <code>locals()</code> or <code>globals()</code> dictionary, this new object is created, and the value is assigned as being the numeric constant <code>180392</code>.</p>
3367 <p>There is now one reference to <code>my_variable</code>, so the reference counter for <code>my_variable</code> is incremented by 1. </p>
3368 <p>You will see function calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/object.c#L239"><code>Py_INCREF()</code></a> and <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/object.c#L245"><code>Py_DECREF()</code></a> throughout the C source code for CPython. These functions increment and decrement the count of references to that object.</p>
3369 <p>References to an object are decremented when a variable falls outside of the scope in which it was declared. Scope in Python can refer to a function or method, a comprehension, or a lambda function. These are some of the more literal scopes, but there are many other implicit scopes, like passing variables to a function call.</p>
3370 <p>The handling of incrementing and decrementing references based on the language is built into the CPython compiler and the core execution loop, <code>ceval.c</code>, which we will cover in detail later in this article.</p>
3371 <p>Whenever <code>Py_DECREF()</code> is called, and the counter becomes 0, the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/obmalloc.c#L707"><code>PyObject_Free()</code></a> function is called. For that object <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pyarena.c#L157"><code>PyArena_Free()</code></a> is called for all of the memory that was allocated. </p>
3372 <h4 id="garbage-collection">Garbage Collection</h4>
3373 <p>How often does your garbage get collected? Weekly, or fortnightly? </p>
3374 <p>When you&rsquo;re finished with something, you discard it and throw it in the trash. But that trash won&rsquo;t get collected straight away. You need to wait for the garbage trucks to come and pick it up.</p>
3375 <p>CPython has the same principle, using a garbage collection algorithm. CPython&rsquo;s garbage collector is enabled by default, happens in the background and works to deallocate memory that&rsquo;s been used for objects which are no longer in use.</p>
3376 <p>Because the garbage collection algorithm is a lot more complex than the reference counter, it doesn&rsquo;t happen all the time, otherwise, it would consume a huge amount of CPU resources. It happens periodically, after a set number of operations.</p>
3377 <p>CPython&rsquo;s standard library comes with a Python module to interface with the arena and the garbage collector, the <code>gc</code> module. Here&rsquo;s how to use the <code>gc</code> module in debug mode:</p>
3378 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">gc</span>
3379 <span class="gp">&gt;&gt;&gt; </span><span class="n">gc</span><span class="o">.</span><span class="n">set_debug</span><span class="p">(</span><span class="n">gc</span><span class="o">.</span><span class="n">DEBUG_STATS</span><span class="p">)</span>
3380 </pre></div>
3381 
3382 <p>This will print the statistics whenever the garbage collector is run.</p>
3383 <p>You can get the threshold after which the garbage collector is run by calling <code>get_threshold()</code>:</p>
3384 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">gc</span><span class="o">.</span><span class="n">get_threshold</span><span class="p">()</span>
3385 <span class="go">(700, 10, 10)</span>
3386 </pre></div>
3387 
3388 <p>You can also get the current threshold counts:</p>
3389 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">gc</span><span class="o">.</span><span class="n">get_count</span><span class="p">()</span>
3390 <span class="go">(688, 1, 1)</span>
3391 </pre></div>
3392 
3393 <p>Lastly, you can run the collection algorithm manually:</p>
3394 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">gc</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span>
3395 <span class="go">24</span>
3396 </pre></div>
3397 
3398 <p>This will call <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Modules/gcmodule.c#L987"><code>collect()</code></a> inside the <code>Modules/gcmodule.c</code> file which contains the implementation of the garbage collector algorithm.</p>
3399 <h3 id="conclusion">Conclusion</h3>
3400 <p>In Part 1, you covered the structure of the source code repository, how to compile from source, and the Python language specification. These core concepts will be critical in Part 2 as you dive deeper into the Python interpreter process.</p>
3401 <h2 h1="h1" id="part-2-the-python-interpreter-process">Part 2: The Python Interpreter Process</h2>
3402 <p>Now that you&rsquo;ve seen the Python grammar and memory management, you can follow the process from typing <code>python</code> to the part where your code is executed.</p>
3403 <p>There are five ways the <code>python</code> binary can be called:</p>
3404 <ol>
3405 <li>To run a single command with <code>-c</code> and a Python command</li>
3406 <li>To start a module with <code>-m</code> and the name of a module</li>
3407 <li>To run a file with the filename</li>
3408 <li>To run the <code>stdin</code> input using a shell pipe</li>
3409 <li>To start the REPL and execute commands one at a time</li>
3410 </ol>
3411 <div class="alert alert-primary" role="alert">
3412 <p>Python has so many ways to execute scripts, it can be a little overwhelming. Darren Jones has put together a <a href="https://realpython.com/courses/running-python-scripts/">great course on running Python scripts</a> if you want to learn more.</p>
3413 </div>
3414 <p>The three source files you need to inspect to see this process are:</p>
3415 <ol>
3416 <li><strong><code>Programs/python.c</code></strong> is a simple entry point.</li>
3417 <li><strong><code>Modules/main.c</code></strong> contains the code to bring together the whole process, loading configuration, executing code and clearing up memory.</li>
3418 <li><strong><code>Python/initconfig.c</code></strong> loads the configuration from the system environment and merges it with any command-line flags.</li>
3419 </ol>
3420 <p>This diagram shows how each of those functions is called:</p>
3421 <p><a href="https://files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png" width="1046" height="851" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png&amp;w=261&amp;sig=8aa36cedaf32be0236896cce2c32b6c4c4ec7e05 261w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png&amp;w=523&amp;sig=a447b53d2afa96dfd204b9265c8b439bd71272d7 523w, https://files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png 1046w" sizes="75vw" alt="Python run swim lane diagram"/></a></p>
3422 <p>The execution mode is determined from the configuration.</p>
3423 <div class="alert alert-primary" role="alert">
3424 <p><strong>The CPython source code style:</strong></p>
3425 <p>Similar to the <a href="https://realpython.com/courses/writing-beautiful-python-code-pep-8/">PEP8 style guide for Python code</a>, there is an <a href="https://www.python.org/dev/peps/pep-0007/">official style guide</a> for the CPython C code, designed originally in 2001 and updated for modern versions. </p>
3426 <p>There are some naming standards which help when navigating the source code:</p>
3427 <ul>
3428 <li>
3429 <p>Use a <code>Py</code> prefix for public functions, never for static functions. The <code>Py_</code> prefix is reserved for global service routines like <code>Py_FatalError</code>. Specific groups of routines (like specific object type APIs) use a longer prefix, such as <code>PyString_</code> for string functions.</p>
3430 </li>
3431 <li>
3432 <p>Public functions and variables use MixedCase with underscores, like this: <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/object.c#L924"><code>PyObject_GetAttr</code></a>, <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/modsupport.h#L20"><code>Py_BuildValue</code></a>, <code>PyExc_TypeError</code>.</p>
3433 </li>
3434 <li>
3435 <p>Occasionally an &ldquo;internal&rdquo; function has to be visible to the loader. We use the <code>_Py</code> prefix for this, for example, <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/object.c#L464"><code>_PyObject_Dump</code></a>.</p>
3436 </li>
3437 <li>
3438 <p>Macros should have a MixedCase prefix and then use upper case, for example <code>PyString_AS_STRING</code>, <code>Py_PRINT_RAW</code>.</p>
3439 </li>
3440 </ul>
3441 </div>
3442 <h3 id="establishing-runtime-configuration">Establishing Runtime Configuration</h3>
3443 <p><a href="https://files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png" width="1046" height="851" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png&amp;w=261&amp;sig=8aa36cedaf32be0236896cce2c32b6c4c4ec7e05 261w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png&amp;w=523&amp;sig=a447b53d2afa96dfd204b9265c8b439bd71272d7 523w, https://files.realpython.com/media/swim-lanes-chart-1.9fb3000aad85.png 1046w" sizes="75vw" alt="Python run swim lane diagram"/></a></p>
3444 <p>In the swimlanes, you can see that before any Python code is executed, the runtime first establishes the configuration.
3445 The configuration of the runtime is a data structure defined in <code>Include/cpython/initconfig.h</code> named <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/cpython/initconfig.h#L407"><code>PyConfig</code></a>.</p>
3446 <p>The configuration data structure includes things like:</p>
3447 <ul>
3448 <li>Runtime flags for various modes like debug and optimized mode</li>
3449 <li>The execution mode, such as whether a filename was passed, <code>stdin</code> was provided or a module name</li>
3450 <li>Extended option, specified by <code>-X &lt;option&gt;</code></li>
3451 <li>Environment variables for runtime settings</li>
3452 </ul>
3453 <p>The configuration data is primarily used by the CPython runtime to enable and disable various features.</p>
3454 <p>Python also comes with several <a href="https://docs.python.org/3/using/cmdline.html">Command Line Interface Options</a>. In Python you can enable verbose mode with the <code>-v</code> flag. In verbose mode, Python will print messages to the screen when modules are loaded:</p>
3455 <div class="highlight sh"><pre><span></span><span class="gp">$</span> ./python.exe -v -c <span class="s2">&quot;print(&#39;hello world&#39;)&quot;</span>
3456 
3457 
3458 <span class="gp">#</span> installing zipimport hook
3459 <span class="go">import zipimport # builtin</span>
3460 <span class="gp">#</span> installed zipimport hook
3461 <span class="go">...</span>
3462 </pre></div>
3463 
3464 <p>You will see a hundred lines or more with all the imports of your user site-packages and anything else in the system environment.</p>
3465 <p>You can see the definition of this flag within <code>Include/cpython/initconfig.h</code> inside the <code>struct</code> for <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/cpython/initconfig.h#L407"><code>PyConfig</code></a>:</p>
3466 <div class="highlight c"><pre><span></span><span class="cm">/* --- PyConfig ---------------------------------------------- */</span>
3467 
3468 <span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
3469     <span class="kt">int</span> <span class="n">_config_version</span><span class="p">;</span>  <span class="cm">/* Internal configuration version,</span>
3470 <span class="cm">                             used for ABI compatibility */</span>
3471     <span class="kt">int</span> <span class="n">_config_init</span><span class="p">;</span>     <span class="cm">/* _PyConfigInitEnum value */</span>
3472 
3473     <span class="p">...</span>
3474 
3475     <span class="cm">/* If greater than 0, enable the verbose mode: print a message each time a</span>
3476 <span class="cm">       module is initialized, showing the place (filename or built-in module)</span>
3477 <span class="cm">       from which it is loaded.</span>
3478 
3479 <span class="cm">       If greater or equal to 2, print a message for each file that is checked</span>
3480 <span class="cm">       for when searching for a module. Also provides information on module</span>
3481 <span class="cm">       cleanup at exit.</span>
3482 
3483 <span class="cm">       Incremented by the -v option. Set by the PYTHONVERBOSE environment</span>
3484 <span class="cm">       variable. If set to -1 (default), inherit Py_VerboseFlag value. */</span>
3485     <span class="kt">int</span> <span class="n">verbose</span><span class="p">;</span>
3486 </pre></div>
3487 
3488 <p>In <code>Python/initconfig.c</code>, the logic for reading settings from environment variables and runtime command-line flags is established.</p>
3489 <p>In the <code>config_read_env_vars</code> function, the environment variables are read and used to assign the values for the configuration settings:</p>
3490 <div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyStatus</span>
3491 <span class="nf">config_read_env_vars</span><span class="p">(</span><span class="n">PyConfig</span> <span class="o">*</span><span class="n">config</span><span class="p">)</span>
3492 <span class="p">{</span>
3493     <span class="n">PyStatus</span> <span class="n">status</span><span class="p">;</span>
3494     <span class="kt">int</span> <span class="n">use_env</span> <span class="o">=</span> <span class="n">config</span><span class="o">-&gt;</span><span class="n">use_environment</span><span class="p">;</span>
3495 
3496     <span class="cm">/* Get environment variables */</span>
3497 <span class="hll">    <span class="n">_Py_get_env_flag</span><span class="p">(</span><span class="n">use_env</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">config</span><span class="o">-&gt;</span><span class="n">parser_debug</span><span class="p">,</span> <span class="s">&quot;PYTHONDEBUG&quot;</span><span class="p">);</span>
3498 </span>    <span class="n">_Py_get_env_flag</span><span class="p">(</span><span class="n">use_env</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">config</span><span class="o">-&gt;</span><span class="n">verbose</span><span class="p">,</span> <span class="s">&quot;PYTHONVERBOSE&quot;</span><span class="p">);</span>
3499     <span class="n">_Py_get_env_flag</span><span class="p">(</span><span class="n">use_env</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">config</span><span class="o">-&gt;</span><span class="n">optimization_level</span><span class="p">,</span> <span class="s">&quot;PYTHONOPTIMIZE&quot;</span><span class="p">);</span>
3500     <span class="n">_Py_get_env_flag</span><span class="p">(</span><span class="n">use_env</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">config</span><span class="o">-&gt;</span><span class="n">inspect</span><span class="p">,</span> <span class="s">&quot;PYTHONINSPECT&quot;</span><span class="p">);</span>
3501 </pre></div>
3502 
3503 <p>For the verbose setting, you can see that the value of <code>PYTHONVERBOSE</code> is used to set the value of <code>&amp;config-&gt;verbose</code>, if <code>PYTHONVERBOSE</code> is found. If the environment variable does not exist, then the default value of <code>-1</code> will remain.</p>
3504 <p>Then in <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/initconfig.c#L1828"><code>config_parse_cmdline</code></a> within <code>initconfig.c</code> again, the command-line flag is used to set the value, if provided:</p>
3505 <div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyStatus</span>
3506 <span class="nf">config_parse_cmdline</span><span class="p">(</span><span class="n">PyConfig</span> <span class="o">*</span><span class="n">config</span><span class="p">,</span> <span class="n">PyWideStringList</span> <span class="o">*</span><span class="n">warnoptions</span><span class="p">,</span>
3507                      <span class="n">Py_ssize_t</span> <span class="o">*</span><span class="n">opt_index</span><span class="p">)</span>
3508 <span class="p">{</span>
3509 <span class="p">...</span>
3510 
3511         <span class="k">switch</span> <span class="p">(</span><span class="n">c</span><span class="p">)</span> <span class="p">{</span>
3512 <span class="p">...</span>
3513 
3514         <span class="k">case</span> <span class="sc">&#39;v&#39;</span><span class="o">:</span>
3515 <span class="hll">            <span class="n">config</span><span class="o">-&gt;</span><span class="n">verbose</span><span class="o">++</span><span class="p">;</span>
3516 </span>            <span class="k">break</span><span class="p">;</span>
3517 <span class="p">...</span>
3518         <span class="cm">/* This space reserved for other options */</span>
3519 
3520         <span class="k">default</span><span class="o">:</span>
3521             <span class="cm">/* unknown argument: parsing failed */</span>
3522             <span class="n">config_usage</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">program</span><span class="p">);</span>
3523             <span class="k">return</span> <span class="n">_PyStatus_EXIT</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span>
3524         <span class="p">}</span>
3525     <span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">);</span>
3526 </pre></div>
3527 
3528 <p>This value is later copied to a global variable <code>Py_VerboseFlag</code> by the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/initconfig.c#L134"><code>_Py_GetGlobalVariablesAsDict</code></a> function.</p>
3529 <p>Within a Python session, you can access the runtime flags, like verbose mode, quiet mode, using the <code>sys.flags</code> named tuple.
3530 The <code>-X</code> flags are all available inside the <code>sys._xoptions</code> dictionary:</p>
3531 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="go">$ ./python.exe -X dev -q       </span>
3532 
3533 <span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">sys</span>
3534 <span class="gp">&gt;&gt;&gt; </span><span class="n">sys</span><span class="o">.</span><span class="n">flags</span>
3535 <span class="go">sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, </span>
3536 <span class="go"> no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, </span>
3537 <span class="go"> quiet=1, hash_randomization=1, isolated=0, dev_mode=True, utf8_mode=0)</span>
3538 
3539 <span class="gp">&gt;&gt;&gt; </span><span class="n">sys</span><span class="o">.</span><span class="n">_xoptions</span>
3540 <span class="go">{&#39;dev&#39;: True}</span>
3541 </pre></div>
3542 
3543 <p>As well as the runtime configuration in <code>initconfig.h</code>, there is also the build configuration, which is located inside <code>pyconfig.h</code> in the root folder. This file is created dynamically in the <code>configure</code> step in the build process, or by Visual Studio for Windows systems.</p>
3544 <p>You can see the build configuration by running:</p>
3545 <div class="highlight sh"><pre><span></span><span class="gp">$</span> ./python.exe -m sysconfig
3546 </pre></div>
3547 
3548 <h3 id="reading-filesinput">Reading Files/Input</h3>
3549 <p>Once CPython has the runtime configuration and the command-line arguments, it can establish what it needs to execute.</p>
3550 <p>This task is handled by the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Modules/main.c#L665"><code>pymain_main</code></a> function inside <code>Modules/main.c</code>. Depending on the newly created <code>config</code> instance, CPython will now execute code provided via several options.</p>
3551 <h4 id="input-via-c">Input via <code>-c</code></h4>
3552 <p>The simplest is providing CPython a command with the <code>-c</code> option and a Python program inside quotes.</p>
3553 <p>For example:</p>
3554 <div class="highlight sh"><pre><span></span><span class="gp">$</span> ./python.exe -c <span class="s2">&quot;print(&#39;hi&#39;)&quot;</span>
3555 <span class="go">hi</span>
3556 </pre></div>
3557 
3558 <p>Here is the full flowchart of how this happens:</p>
3559 <p><a href="https://files.realpython.com/media/pymain_run_command.f5da561ba7d5.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/pymain_run_command.f5da561ba7d5.png" width="1041" height="751" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pymain_run_command.f5da561ba7d5.png&amp;w=260&amp;sig=f7f802b23e900bf42b29804ee80ed0dd0eaec6a4 260w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/pymain_run_command.f5da561ba7d5.png&amp;w=520&amp;sig=3079060f305945fd3556ce7cd453fac965a6ec27 520w, https://files.realpython.com/media/pymain_run_command.f5da561ba7d5.png 1041w" sizes="75vw" alt="Flow chart of pymain_run_command"/></a></p>
3560 <p>First, the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Modules/main.c#L240"><code>pymain_run_command()</code></a> function is executed inside <code>Modules/main.c</code> taking the command passed in <code>-c</code> as an argument in the C type <code>wchar_t*</code>. The <code>wchar_t*</code> type is often used as a low-level storage type for Unicode data across CPython as the size of the type can store UTF8 characters.</p>
3561 <p>When converting the <code>wchar_t*</code> to a Python string, the <code>Objects/unicodeobject.c</code> file has a helper function <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/unicodeobject.c#L2088"><code>PyUnicode_FromWideChar()</code></a> that returns a <code>PyObject</code>, of type <code>str</code>. The encoding to UTF8 is then done by <code>PyUnicode_AsUTF8String()</code> on the Python <code>str</code> object to convert it to a Python <code>bytes</code> object. </p>
3562 <p>Once this is complete, <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Modules/main.c#L240"><code>pymain_run_command()</code></a> will then pass the Python bytes object to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L453"><code>PyRun_SimpleStringFlags()</code></a> for execution, but first converting the <code>bytes</code> to a <code>str</code> type again:</p>
3563 <div class="highlight c"><pre><span></span><span class="k">static</span> <span class="kt">int</span>
3564 <span class="nf">pymain_run_command</span><span class="p">(</span><span class="kt">wchar_t</span> <span class="o">*</span><span class="n">command</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">cf</span><span class="p">)</span>
3565 <span class="p">{</span>
3566     <span class="n">PyObject</span> <span class="o">*</span><span class="n">unicode</span><span class="p">,</span> <span class="o">*</span><span class="n">bytes</span><span class="p">;</span>
3567     <span class="kt">int</span> <span class="n">ret</span><span class="p">;</span>
3568 
3569     <span class="n">unicode</span> <span class="o">=</span> <span class="n">PyUnicode_FromWideChar</span><span class="p">(</span><span class="n">command</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">);</span>
3570     <span class="k">if</span> <span class="p">(</span><span class="n">unicode</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
3571         <span class="k">goto</span> <span class="n">error</span><span class="p">;</span>
3572     <span class="p">}</span>
3573 
3574     <span class="k">if</span> <span class="p">(</span><span class="n">PySys_Audit</span><span class="p">(</span><span class="s">&quot;cpython.run_command&quot;</span><span class="p">,</span> <span class="s">&quot;O&quot;</span><span class="p">,</span> <span class="n">unicode</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
3575         <span class="k">return</span> <span class="n">pymain_exit_err_print</span><span class="p">();</span>
3576     <span class="p">}</span>
3577 
3578     <span class="n">bytes</span> <span class="o">=</span> <span class="n">PyUnicode_AsUTF8String</span><span class="p">(</span><span class="n">unicode</span><span class="p">);</span>
3579     <span class="n">Py_DECREF</span><span class="p">(</span><span class="n">unicode</span><span class="p">);</span>
3580     <span class="k">if</span> <span class="p">(</span><span class="n">bytes</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
3581         <span class="k">goto</span> <span class="n">error</span><span class="p">;</span>
3582     <span class="p">}</span>
3583 
3584     <span class="n">ret</span> <span class="o">=</span> <span class="n">PyRun_SimpleStringFlags</span><span class="p">(</span><span class="n">PyBytes_AsString</span><span class="p">(</span><span class="n">bytes</span><span class="p">),</span> <span class="n">cf</span><span class="p">);</span>
3585     <span class="n">Py_DECREF</span><span class="p">(</span><span class="n">bytes</span><span class="p">);</span>
3586     <span class="k">return</span> <span class="p">(</span><span class="n">ret</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">);</span>
3587 
3588 <span class="nl">error</span><span class="p">:</span>
3589     <span class="n">PySys_WriteStderr</span><span class="p">(</span><span class="s">&quot;Unable to decode the command from the command line:</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span>
3590     <span class="k">return</span> <span class="n">pymain_exit_err_print</span><span class="p">();</span>
3591 <span class="p">}</span>
3592 </pre></div>
3593 
3594 <p>The conversion of <code>wchar_t*</code> to Unicode, bytes, and then a string is roughly equivalent to the following:</p>
3595 <div class="highlight python"><pre><span></span><span class="n">unicode</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">command</span><span class="p">)</span>
3596 <span class="n">bytes_</span> <span class="o">=</span> <span class="nb">bytes</span><span class="p">(</span><span class="n">unicode</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s1">&#39;utf8&#39;</span><span class="p">))</span>
3597 <span class="c1"># call PyRun_SimpleStringFlags with bytes_</span>
3598 </pre></div>
3599 
3600 <p>The <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L453"><code>PyRun_SimpleStringFlags()</code></a> function is part of <code>Python/pythonrun.c</code>. It&rsquo;s purpose is to turn this simple command into a Python module and then send it on to be executed.
3601 Since a Python module needs to have <code>__main__</code> to be executed as a standalone module, it creates that automatically:</p>
3602 <div class="highlight c"><pre><span></span><span class="kt">int</span>
3603 <span class="nf">PyRun_SimpleStringFlags</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">command</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">)</span>
3604 <span class="p">{</span>
3605     <span class="n">PyObject</span> <span class="o">*</span><span class="n">m</span><span class="p">,</span> <span class="o">*</span><span class="n">d</span><span class="p">,</span> <span class="o">*</span><span class="n">v</span><span class="p">;</span>
3606 <span class="hll">    <span class="n">m</span> <span class="o">=</span> <span class="n">PyImport_AddModule</span><span class="p">(</span><span class="s">&quot;__main__&quot;</span><span class="p">);</span>
3607 </span>    <span class="k">if</span> <span class="p">(</span><span class="n">m</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
3608         <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
3609 <span class="hll">    <span class="n">d</span> <span class="o">=</span> <span class="n">PyModule_GetDict</span><span class="p">(</span><span class="n">m</span><span class="p">);</span>
3610 </span><span class="hll">    <span class="n">v</span> <span class="o">=</span> <span class="n">PyRun_StringFlags</span><span class="p">(</span><span class="n">command</span><span class="p">,</span> <span class="n">Py_file_input</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span> <span class="n">flags</span><span class="p">);</span>
3611 </span>    <span class="k">if</span> <span class="p">(</span><span class="n">v</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
3612         <span class="n">PyErr_Print</span><span class="p">();</span>
3613         <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
3614     <span class="p">}</span>
3615     <span class="n">Py_DECREF</span><span class="p">(</span><span class="n">v</span><span class="p">);</span>
3616     <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
3617 <span class="p">}</span>
3618 </pre></div>
3619 
3620 <p>Once <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L453"><code>PyRun_SimpleStringFlags()</code></a> has created a module and a dictionary, it calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1008"><code>PyRun_StringFlags()</code></a>, which creates a fake filename and then calls the Python parser to create an AST from the string and return a module, <code>mod</code>:</p>
3621 <div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span>
3622 <span class="nf">PyRun_StringFlags</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">str</span><span class="p">,</span> <span class="kt">int</span> <span class="n">start</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">globals</span><span class="p">,</span>
3623                   <span class="n">PyObject</span> <span class="o">*</span><span class="n">locals</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">)</span>
3624 <span class="p">{</span>
3625 <span class="p">...</span>
3626     <span class="n">mod</span> <span class="o">=</span> <span class="n">PyParser_ASTFromStringObject</span><span class="p">(</span><span class="n">str</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">start</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
3627     <span class="k">if</span> <span class="p">(</span><span class="n">mod</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span>
3628         <span class="n">ret</span> <span class="o">=</span> <span class="n">run_mod</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">globals</span><span class="p">,</span> <span class="n">locals</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
3629     <span class="n">PyArena_Free</span><span class="p">(</span><span class="n">arena</span><span class="p">);</span>
3630     <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
3631 </pre></div>
3632 
3633 <p>You&rsquo;ll dive into the AST and Parser code in the next section.</p>
3634 <h4 id="input-via-m">Input via <code>-m</code></h4>
3635 <p>Another way to execute Python commands is by using the <code>-m</code> option with the name of a module.
3636 A typical example is <code>python -m unittest</code> to run the unittest module in the standard library.</p>
3637 <p>Being able to execute modules as scripts were initially proposed in <a href="https://www.python.org/dev/peps/pep-0338">PEP 338</a> and then the standard for explicit relative imports defined in <a href="https://www.python.org/dev/peps/pep-0366">PEP366</a>. </p>
3638 <p>The use of the <code>-m</code> flag implies that within the module package, you want to execute whatever is inside <a href="https://realpython.com/python-main-function/"><code>__main__</code></a>. It also implies that you want to search <code>sys.path</code> for the named module.</p>
3639 <p>This search mechanism is why you don&rsquo;t need to remember where the <code>unittest</code> module is stored on your filesystem.</p>
3640 <p>Inside <code>Modules/main.c</code> there is a function called when the command-line is run with the <code>-m</code> flag. The name of the module is passed as the <code>modname</code> argument.</p>
3641 <p>CPython will then import a standard library module, <code>runpy</code> and execute it using <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/call.c#L214"><code>PyObject_Call()</code></a>. The import is done using the C API function <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/import.c#L1409"><code>PyImport_ImportModule()</code></a>, found within the <code>Python/import.c</code> file:</p>
3642 <div class="highlight c"><pre><span></span><span class="k">static</span> <span class="kt">int</span>
3643 <span class="nf">pymain_run_module</span><span class="p">(</span><span class="k">const</span> <span class="kt">wchar_t</span> <span class="o">*</span><span class="n">modname</span><span class="p">,</span> <span class="kt">int</span> <span class="n">set_argv0</span><span class="p">)</span>
3644 <span class="p">{</span>
3645     <span class="n">PyObject</span> <span class="o">*</span><span class="n">module</span><span class="p">,</span> <span class="o">*</span><span class="n">runpy</span><span class="p">,</span> <span class="o">*</span><span class="n">runmodule</span><span class="p">,</span> <span class="o">*</span><span class="n">runargs</span><span class="p">,</span> <span class="o">*</span><span class="n">result</span><span class="p">;</span>
3646     <span class="n">runpy</span> <span class="o">=</span> <span class="n">PyImport_ImportModule</span><span class="p">(</span><span class="s">&quot;runpy&quot;</span><span class="p">);</span>
3647  <span class="p">...</span>
3648     <span class="n">runmodule</span> <span class="o">=</span> <span class="n">PyObject_GetAttrString</span><span class="p">(</span><span class="n">runpy</span><span class="p">,</span> <span class="s">&quot;_run_module_as_main&quot;</span><span class="p">);</span>
3649  <span class="p">...</span>
3650     <span class="n">module</span> <span class="o">=</span> <span class="n">PyUnicode_FromWideChar</span><span class="p">(</span><span class="n">modname</span><span class="p">,</span> <span class="n">wcslen</span><span class="p">(</span><span class="n">modname</span><span class="p">));</span>
3651  <span class="p">...</span>
3652     <span class="n">runargs</span> <span class="o">=</span> <span class="n">Py_BuildValue</span><span class="p">(</span><span class="s">&quot;(Oi)&quot;</span><span class="p">,</span> <span class="n">module</span><span class="p">,</span> <span class="n">set_argv0</span><span class="p">);</span>
3653  <span class="p">...</span>
3654     <span class="n">result</span> <span class="o">=</span> <span class="n">PyObject_Call</span><span class="p">(</span><span class="n">runmodule</span><span class="p">,</span> <span class="n">runargs</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span>
3655  <span class="p">...</span>
3656     <span class="k">if</span> <span class="p">(</span><span class="n">result</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
3657         <span class="k">return</span> <span class="n">pymain_exit_err_print</span><span class="p">();</span>
3658     <span class="p">}</span>
3659     <span class="n">Py_DECREF</span><span class="p">(</span><span class="n">result</span><span class="p">);</span>
3660     <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
3661 <span class="p">}</span>
3662 </pre></div>
3663 
3664 <p>In this function you&rsquo;ll also see 2 other C API functions: <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/call.c#L214"><code>PyObject_Call()</code></a> and <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/object.c#L831"><code>PyObject_GetAttrString()</code></a>. Because <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/import.c#L1409"><code>PyImport_ImportModule()</code></a> returns a <code>PyObject*</code>, the core object type, you need to call special functions to get attributes and to call it.</p>
3665 <p>In Python, if you had an object and wanted to get an attribute, then you could call <code>getattr()</code>. In the C API, this call is <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/object.c#L831"><code>PyObject_GetAttrString()</code></a>, which is found in <code>Objects/object.c</code>. If you wanted to run a callable, you would give it parentheses, or you can run the <code>__call__()</code> property on any Python object. The <code>__call__()</code> method is implemented inside <code>Objects/object.c</code>:</p>
3666 <div class="highlight python"><pre><span></span><span class="n">hi</span> <span class="o">=</span> <span class="s2">&quot;hi!&quot;</span>
3667 <span class="n">hi</span><span class="o">.</span><span class="n">upper</span><span class="p">()</span> <span class="o">==</span> <span class="n">hi</span><span class="o">.</span><span class="n">upper</span><span class="o">.</span><span class="fm">__call__</span><span class="p">()</span>  <span class="c1"># this is the same</span>
3668 </pre></div>
3669 
3670 <p>The <code>runpy</code> module is written in pure Python and located in <code>Lib/runpy.py</code>.</p>
3671 <p>Executing <code>python -m &lt;module&gt;</code> is equivalent to running <code>python -m runpy &lt;module&gt;</code>. The <code>runpy</code> module was created to abstract the process of locating and executing modules on an operating system.</p>
3672 <p><code>runpy</code> does a few things to run the target module:</p>
3673 <ul>
3674 <li>Calls <code>__import__()</code> for the module name you provided</li>
3675 <li>Sets <code>__name__</code> (the module name) to a namespace called <code>__main__</code></li>
3676 <li>Executes the module within the <code>__main__</code> namespace</li>
3677 </ul>
3678 <p>The <code>runpy</code> module also supports executing directories and zip files.</p>
3679 <h4 id="input-via-filename">Input via Filename</h4>
3680 <p>If the first argument to <code>python</code> was a filename, such as <code>python test.py</code>, then CPython will open a file handle, similar to using <code>open()</code> in Python and pass the handle to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L372"><code>PyRun_SimpleFileExFlags()</code></a> inside <code>Python/pythonrun.c</code>.</p>
3681 <p>There are 3 paths this function can take:</p>
3682 <ol>
3683 <li>If the file path is a <code>.pyc</code> file, it will call <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1145"><code>run_pyc_file()</code></a>.</li>
3684 <li>If the file path is a script file (<code>.py</code>) it will run <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1032"><code>PyRun_FileExFlags()</code></a>.</li>
3685 <li>If the filepath is <code>stdin</code> because the user ran <code>command | python</code> then treat <code>stdin</code> as a file handle and run <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1032"><code>PyRun_FileExFlags()</code></a>.</li>
3686 </ol>
3687 <div class="highlight c"><pre><span></span><span class="kt">int</span>
3688 <span class="nf">PyRun_SimpleFileExFlags</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">fp</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span> <span class="kt">int</span> <span class="n">closeit</span><span class="p">,</span>
3689                         <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">)</span>
3690 <span class="p">{</span>
3691  <span class="p">...</span>
3692     <span class="n">m</span> <span class="o">=</span> <span class="n">PyImport_AddModule</span><span class="p">(</span><span class="s">&quot;__main__&quot;</span><span class="p">);</span>
3693  <span class="p">...</span>
3694 <span class="hll">    <span class="k">if</span> <span class="p">(</span><span class="n">maybe_pyc_file</span><span class="p">(</span><span class="n">fp</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">ext</span><span class="p">,</span> <span class="n">closeit</span><span class="p">))</span> <span class="p">{</span>
3695 </span> <span class="p">...</span>
3696 <span class="hll">        <span class="n">v</span> <span class="o">=</span> <span class="n">run_pyc_file</span><span class="p">(</span><span class="n">pyc_fp</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span> <span class="n">flags</span><span class="p">);</span>
3697 </span>    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
3698         <span class="cm">/* When running from stdin, leave __main__.__loader__ alone */</span>
3699         <span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="s">&quot;&lt;stdin&gt;&quot;</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span>
3700             <span class="n">set_main_loader</span><span class="p">(</span><span class="n">d</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="s">&quot;SourceFileLoader&quot;</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
3701             <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&quot;python: failed to set __main__.__loader__</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span>
3702             <span class="n">ret</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
3703             <span class="k">goto</span> <span class="n">done</span><span class="p">;</span>
3704         <span class="p">}</span>
3705 <span class="hll">        <span class="n">v</span> <span class="o">=</span> <span class="n">PyRun_FileExFlags</span><span class="p">(</span><span class="n">fp</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">Py_file_input</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span>
3706 </span><span class="hll">                              <span class="n">closeit</span><span class="p">,</span> <span class="n">flags</span><span class="p">);</span>
3707 </span>    <span class="p">}</span>
3708  <span class="p">...</span>
3709     <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
3710 <span class="p">}</span>
3711 </pre></div>
3712 
3713 <h4 id="input-via-file-with-pyrun_fileexflags">Input via File With <code>PyRun_FileExFlags()</code></h4>
3714 <p>For <code>stdin</code> and basic script files, CPython will pass the file handle to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1032"><code>PyRun_FileExFlags()</code></a> located in the <code>pythonrun.c</code> file.</p>
3715 <p>The purpose of <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1032"><code>PyRun_FileExFlags()</code></a> is similar to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L453"><code>PyRun_SimpleStringFlags()</code></a> used for the <code>-c</code> input. CPython will load the file handle into <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1369"><code>PyParser_ASTFromFileObject()</code></a>. We&rsquo;ll cover the Parser and AST modules in the next section.
3716 Because this is a full script, it doesn&rsquo;t need the <code>PyImport_AddModule("__main__");</code> step used by <code>-c</code>:</p>
3717 <div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span>
3718 <span class="nf">PyRun_FileExFlags</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">fp</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">filename_str</span><span class="p">,</span> <span class="kt">int</span> <span class="n">start</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">globals</span><span class="p">,</span>
3719                   <span class="n">PyObject</span> <span class="o">*</span><span class="n">locals</span><span class="p">,</span> <span class="kt">int</span> <span class="n">closeit</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">)</span>
3720 <span class="p">{</span>
3721  <span class="p">...</span>
3722     <span class="n">mod</span> <span class="o">=</span> <span class="n">PyParser_ASTFromFileObject</span><span class="p">(</span><span class="n">fp</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">start</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span>
3723                                      <span class="n">flags</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
3724  <span class="p">...</span>
3725     <span class="n">ret</span> <span class="o">=</span> <span class="n">run_mod</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">globals</span><span class="p">,</span> <span class="n">locals</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
3726 <span class="p">}</span>
3727 </pre></div>
3728 
3729 <p>Identical to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L453"><code>PyRun_SimpleStringFlags()</code></a>, once <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1032"><code>PyRun_FileExFlags()</code></a> has created a Python module from the file, it sent it to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1125"><code>run_mod()</code></a> to be executed.</p>
3730 <p><a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1125"><code>run_mod()</code></a> is found within <code>Python/pythonrun.c</code>, and sends the module to the AST to be compiled into a code object. Code objects are a format used to store the bytecode operations and the format kept in <code>.pyc</code> files:</p>
3731 <div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyObject</span> <span class="o">*</span>
3732 <span class="nf">run_mod</span><span class="p">(</span><span class="n">mod_ty</span> <span class="n">mod</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">globals</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">locals</span><span class="p">,</span>
3733             <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">,</span> <span class="n">PyArena</span> <span class="o">*</span><span class="n">arena</span><span class="p">)</span>
3734 <span class="p">{</span>
3735     <span class="n">PyCodeObject</span> <span class="o">*</span><span class="n">co</span><span class="p">;</span>
3736     <span class="n">PyObject</span> <span class="o">*</span><span class="n">v</span><span class="p">;</span>
3737     <span class="n">co</span> <span class="o">=</span> <span class="n">PyAST_CompileObject</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
3738     <span class="k">if</span> <span class="p">(</span><span class="n">co</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
3739         <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
3740 
3741     <span class="k">if</span> <span class="p">(</span><span class="n">PySys_Audit</span><span class="p">(</span><span class="s">&quot;exec&quot;</span><span class="p">,</span> <span class="s">&quot;O&quot;</span><span class="p">,</span> <span class="n">co</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
3742         <span class="n">Py_DECREF</span><span class="p">(</span><span class="n">co</span><span class="p">);</span>
3743         <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
3744     <span class="p">}</span>
3745 
3746     <span class="n">v</span> <span class="o">=</span> <span class="n">run_eval_code_obj</span><span class="p">(</span><span class="n">co</span><span class="p">,</span> <span class="n">globals</span><span class="p">,</span> <span class="n">locals</span><span class="p">);</span>
3747     <span class="n">Py_DECREF</span><span class="p">(</span><span class="n">co</span><span class="p">);</span>
3748     <span class="k">return</span> <span class="n">v</span><span class="p">;</span>
3749 <span class="p">}</span>
3750 </pre></div>
3751 
3752 <p>We will cover the CPython compiler and bytecodes in the next section. The call to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1094"><code>run_eval_code_obj()</code></a> is a simple wrapper function that calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L716"><code>PyEval_EvalCode()</code></a> in the <code>Python/eval.c</code> file. The <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L716"><code>PyEval_EvalCode()</code></a> function is the main evaluation loop for CPython, it iterates over each bytecode statement and executes it on your local machine.</p>
3753 <h4 id="input-via-compiled-bytecode-with-run_pyc_file">Input via Compiled Bytecode With <code>run_pyc_file()</code></h4>
3754 <p>In the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L372"><code>PyRun_SimpleFileExFlags()</code></a> there was a clause for the user providing a file path to a <code>.pyc</code> file. If the file path ended in <code>.pyc</code> then instead of loading the file as a plain text file and parsing it, it will assume that the <code>.pyc</code> file contains a code object written to disk. </p>
3755 <p>The <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1145"><code>run_pyc_file()</code></a> function inside <code>Python/pythonrun.c</code> then marshals the code object from the <code>.pyc</code> file by using the file handle. <strong>Marshaling</strong> is a technical term for copying the contents of a file into memory and converting them to a specific data structure. The code object data structure on the disk is the CPython compiler&rsquo;s way to caching compiled code so that it doesn&rsquo;t need to parse it every time the script is called:</p>
3756 <div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyObject</span> <span class="o">*</span>
3757 <span class="nf">run_pyc_file</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">fp</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">globals</span><span class="p">,</span>
3758              <span class="n">PyObject</span> <span class="o">*</span><span class="n">locals</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">)</span>
3759 <span class="p">{</span>
3760     <span class="n">PyCodeObject</span> <span class="o">*</span><span class="n">co</span><span class="p">;</span>
3761     <span class="n">PyObject</span> <span class="o">*</span><span class="n">v</span><span class="p">;</span>
3762   <span class="p">...</span>
3763 <span class="hll">    <span class="n">v</span> <span class="o">=</span> <span class="n">PyMarshal_ReadLastObjectFromFile</span><span class="p">(</span><span class="n">fp</span><span class="p">);</span>
3764 </span>  <span class="p">...</span>
3765     <span class="k">if</span> <span class="p">(</span><span class="n">v</span> <span class="o">==</span> <span class="nb">NULL</span> <span class="o">||</span> <span class="o">!</span><span class="n">PyCode_Check</span><span class="p">(</span><span class="n">v</span><span class="p">))</span> <span class="p">{</span>
3766         <span class="n">Py_XDECREF</span><span class="p">(</span><span class="n">v</span><span class="p">);</span>
3767         <span class="n">PyErr_SetString</span><span class="p">(</span><span class="n">PyExc_RuntimeError</span><span class="p">,</span>
3768                    <span class="s">&quot;Bad code object in .pyc file&quot;</span><span class="p">);</span>
3769         <span class="k">goto</span> <span class="n">error</span><span class="p">;</span>
3770     <span class="p">}</span>
3771     <span class="n">fclose</span><span class="p">(</span><span class="n">fp</span><span class="p">);</span>
3772 <span class="hll">    <span class="n">co</span> <span class="o">=</span> <span class="p">(</span><span class="n">PyCodeObject</span> <span class="o">*</span><span class="p">)</span><span class="n">v</span><span class="p">;</span>
3773 </span><span class="hll">    <span class="n">v</span> <span class="o">=</span> <span class="n">run_eval_code_obj</span><span class="p">(</span><span class="n">co</span><span class="p">,</span> <span class="n">globals</span><span class="p">,</span> <span class="n">locals</span><span class="p">);</span>
3774 </span>    <span class="k">if</span> <span class="p">(</span><span class="n">v</span> <span class="o">&amp;&amp;</span> <span class="n">flags</span><span class="p">)</span>
3775         <span class="n">flags</span><span class="o">-&gt;</span><span class="n">cf_flags</span> <span class="o">|=</span> <span class="p">(</span><span class="n">co</span><span class="o">-&gt;</span><span class="n">co_flags</span> <span class="o">&amp;</span> <span class="n">PyCF_MASK</span><span class="p">);</span>
3776     <span class="n">Py_DECREF</span><span class="p">(</span><span class="n">co</span><span class="p">);</span>
3777     <span class="k">return</span> <span class="n">v</span><span class="p">;</span>
3778 <span class="p">}</span>
3779 </pre></div>
3780 
3781 <p>Once the code object has been marshaled to memory, it is sent to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1094"><code>run_eval_code_obj()</code></a>, which calls <code>Python/ceval.c</code> to execute the code.</p>
3782 <h3 id="lexing-and-parsing">Lexing and Parsing</h3>
3783 <p>In the exploration of reading and executing Python files, we dived as deep as the parser and AST modules, with function calls to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1369"><code>PyParser_ASTFromFileObject()</code></a>.</p>
3784 <p>Sticking within <code>Python/pythonrun.c</code>, the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1369"><code>PyParser_ASTFromFileObject()</code></a> function will take a file handle, compiler flags and a <code>PyArena</code> instance and convert the file object into a node object using <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/parsetok.c#L163"><code>PyParser_ParseFileObject()</code></a>.</p>
3785 <p>With the node object, it will then convert that into a module using the AST function <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L772"><code>PyAST_FromNodeObject()</code></a>:</p>
3786 <div class="highlight c"><pre><span></span><span class="n">mod_ty</span>
3787 <span class="nf">PyParser_ASTFromFileObject</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">fp</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">enc</span><span class="p">,</span>
3788                            <span class="kt">int</span> <span class="n">start</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">ps1</span><span class="p">,</span>
3789                            <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">ps2</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">errcode</span><span class="p">,</span>
3790                            <span class="n">PyArena</span> <span class="o">*</span><span class="n">arena</span><span class="p">)</span>
3791 <span class="p">{</span>
3792     <span class="p">...</span>
3793 <span class="hll">    <span class="n">node</span> <span class="o">*</span><span class="n">n</span> <span class="o">=</span> <span class="n">PyParser_ParseFileObject</span><span class="p">(</span><span class="n">fp</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">enc</span><span class="p">,</span>
3794 </span><span class="hll">                                       <span class="o">&amp;</span><span class="n">_PyParser_Grammar</span><span class="p">,</span>
3795 </span><span class="hll">                                       <span class="n">start</span><span class="p">,</span> <span class="n">ps1</span><span class="p">,</span> <span class="n">ps2</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">err</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">iflags</span><span class="p">);</span>
3796 </span>    <span class="p">...</span>
3797     <span class="k">if</span> <span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="p">{</span>
3798 <span class="hll">        <span class="n">flags</span><span class="o">-&gt;</span><span class="n">cf_flags</span> <span class="o">|=</span> <span class="n">iflags</span> <span class="o">&amp;</span> <span class="n">PyCF_MASK</span><span class="p">;</span>
3799 </span>        <span class="n">mod</span> <span class="o">=</span> <span class="n">PyAST_FromNodeObject</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
3800         <span class="n">PyNode_Free</span><span class="p">(</span><span class="n">n</span><span class="p">);</span>
3801     <span class="p">...</span>
3802     <span class="k">return</span> <span class="n">mod</span><span class="p">;</span>
3803 <span class="p">}</span>
3804 </pre></div>
3805 
3806 <p>For <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/parsetok.c#L163"><code>PyParser_ParseFileObject()</code></a> we switch to <code>Parser/parsetok.c</code> and the parser-tokenizer stage of the CPython interpreter. This function has two important tasks:</p>
3807 <ol>
3808 <li>Instantiate a tokenizer state <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/tokenizer.h#L23"><code>tok_state</code></a> using <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/tokenizer.h#L78"><code>PyTokenizer_FromFile()</code></a> in <code>Parser/tokenizer.c</code></li>
3809 <li>Convert the tokens into a concrete parse tree (a list of <code>node</code>) using <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/parsetok.c#L232"><code>parsetok()</code></a> in <code>Parser/parsetok.c</code> </li>
3810 </ol>
3811 <div class="highlight c"><pre><span></span><span class="n">node</span> <span class="o">*</span>
3812 <span class="nf">PyParser_ParseFileObject</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">fp</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span>
3813                          <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">enc</span><span class="p">,</span> <span class="n">grammar</span> <span class="o">*</span><span class="n">g</span><span class="p">,</span> <span class="kt">int</span> <span class="n">start</span><span class="p">,</span>
3814                          <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">ps1</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">ps2</span><span class="p">,</span>
3815                          <span class="n">perrdetail</span> <span class="o">*</span><span class="n">err_ret</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">flags</span><span class="p">)</span>
3816 <span class="p">{</span>
3817 <span class="hll">    <span class="k">struct</span> <span class="n">tok_state</span> <span class="o">*</span><span class="n">tok</span><span class="p">;</span>
3818 </span><span class="p">...</span>
3819 <span class="hll">    <span class="k">if</span> <span class="p">((</span><span class="n">tok</span> <span class="o">=</span> <span class="n">PyTokenizer_FromFile</span><span class="p">(</span><span class="n">fp</span><span class="p">,</span> <span class="n">enc</span><span class="p">,</span> <span class="n">ps1</span><span class="p">,</span> <span class="n">ps2</span><span class="p">))</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
3820 </span>        <span class="n">err_ret</span><span class="o">-&gt;</span><span class="n">error</span> <span class="o">=</span> <span class="n">E_NOMEM</span><span class="p">;</span>
3821         <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
3822     <span class="p">}</span>
3823 <span class="p">...</span>
3824 <span class="hll">    <span class="k">return</span> <span class="n">parsetok</span><span class="p">(</span><span class="n">tok</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">start</span><span class="p">,</span> <span class="n">err_ret</span><span class="p">,</span> <span class="n">flags</span><span class="p">);</span>
3825 </span><span class="p">}</span>
3826 </pre></div>
3827 
3828 <p><code>tok_state</code> (defined in <code>Parser/tokenizer.h</code>) is the data structure to store all temporary data generated by the tokenizer. It is returned to the parser-tokenizer as the data structure is required by <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/parsetok.c#L232"><code>parsetok()</code></a> to develop the concrete syntax tree.</p>
3829 <p>Inside <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/parsetok.c#L232"><code>parsetok()</code></a>, it will use the <code>tok_state</code> structure and make calls to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/tokenizer.c#L1110"><code>tok_get()</code></a> in a loop until the file is exhausted and no more tokens can be found.</p>
3830 <p><a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/tokenizer.c#L1110"><code>tok_get()</code></a>, defined in <code>Parser/tokenizer.c</code> behaves like an iterator. It will keep returning the next token in the parse tree.</p>
3831 <p><a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/tokenizer.c#L1110"><code>tok_get()</code></a> is one of the most complex functions in the whole CPython codebase. It has over 640 lines and includes decades of heritage with edge cases, new language features, and syntax.</p>
3832 <p>One of the simpler examples would be the part that converts a newline break into a NEWLINE token:</p>
3833 <div class="highlight c"><pre><span></span><span class="k">static</span> <span class="kt">int</span>
3834 <span class="nf">tok_get</span><span class="p">(</span><span class="k">struct</span> <span class="n">tok_state</span> <span class="o">*</span><span class="n">tok</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">p_start</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">p_end</span><span class="p">)</span>
3835 <span class="p">{</span>
3836 <span class="p">...</span>
3837     <span class="cm">/* Newline */</span>
3838     <span class="k">if</span> <span class="p">(</span><span class="n">c</span> <span class="o">==</span> <span class="sc">&#39;\n&#39;</span><span class="p">)</span> <span class="p">{</span>
3839         <span class="n">tok</span><span class="o">-&gt;</span><span class="n">atbol</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
3840         <span class="k">if</span> <span class="p">(</span><span class="n">blankline</span> <span class="o">||</span> <span class="n">tok</span><span class="o">-&gt;</span><span class="n">level</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
3841             <span class="k">goto</span> <span class="n">nextline</span><span class="p">;</span>
3842         <span class="p">}</span>
3843         <span class="o">*</span><span class="n">p_start</span> <span class="o">=</span> <span class="n">tok</span><span class="o">-&gt;</span><span class="n">start</span><span class="p">;</span>
3844         <span class="o">*</span><span class="n">p_end</span> <span class="o">=</span> <span class="n">tok</span><span class="o">-&gt;</span><span class="n">cur</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span> <span class="cm">/* Leave &#39;\n&#39; out of the string */</span>
3845         <span class="n">tok</span><span class="o">-&gt;</span><span class="n">cont_line</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
3846         <span class="k">if</span> <span class="p">(</span><span class="n">tok</span><span class="o">-&gt;</span><span class="n">async_def</span><span class="p">)</span> <span class="p">{</span>
3847             <span class="cm">/* We&#39;re somewhere inside an &#39;async def&#39; function, and</span>
3848 <span class="cm">               we&#39;ve encountered a NEWLINE after its signature. */</span>
3849             <span class="n">tok</span><span class="o">-&gt;</span><span class="n">async_def_nl</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
3850         <span class="p">}</span>
3851         <span class="k">return</span> <span class="n">NEWLINE</span><span class="p">;</span>
3852     <span class="p">}</span>
3853 <span class="p">...</span>
3854 <span class="p">}</span>
3855 </pre></div>
3856 
3857 <p>In this case, <code>NEWLINE</code> is a token, with a value defined in <code>Include/token.h</code>. All tokens are constant <code>int</code> values, and the <code>Include/token.h</code> file was generated earlier when we ran <code>make regen-grammar</code>.</p>
3858 <p>The <code>node</code> type returned by <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Parser/parsetok.c#L163"><code>PyParser_ParseFileObject()</code></a> is going to be essential for the next stage, converting a parse tree into an Abstract-Syntax-Tree (AST):</p>
3859 <div class="highlight c"><pre><span></span><span class="k">typedef</span> <span class="k">struct</span> <span class="n">_node</span> <span class="p">{</span>
3860     <span class="kt">short</span>               <span class="n">n_type</span><span class="p">;</span>
3861     <span class="kt">char</span>                <span class="o">*</span><span class="n">n_str</span><span class="p">;</span>
3862     <span class="kt">int</span>                 <span class="n">n_lineno</span><span class="p">;</span>
3863     <span class="kt">int</span>                 <span class="n">n_col_offset</span><span class="p">;</span>
3864     <span class="kt">int</span>                 <span class="n">n_nchildren</span><span class="p">;</span>
3865     <span class="k">struct</span> <span class="n">_node</span>        <span class="o">*</span><span class="n">n_child</span><span class="p">;</span>
3866     <span class="kt">int</span>                 <span class="n">n_end_lineno</span><span class="p">;</span>
3867     <span class="kt">int</span>                 <span class="n">n_end_col_offset</span><span class="p">;</span>
3868 <span class="p">}</span> <span class="n">node</span><span class="p">;</span>
3869 </pre></div>
3870 
3871 <p>Since the CST is a tree of syntax, token IDs, and symbols, it would be difficult for the compiler to make quick decisions based on the Python language.</p>
3872 <p>That is why the next stage is to convert the CST into an AST, a much higher-level structure. This task is performed by the <code>Python/ast.c</code> module, which has both a C and Python API.</p>
3873 <p>Before you jump into the AST, there is a way to access the output from the parser stage. CPython has a standard library module <code>parser</code>, which exposes the C functions with a Python API.</p>
3874 <p>The module is documented as an implementation detail of CPython so that you won&rsquo;t see it in other Python interpreters. Also the output from the functions is not that easy to read.</p>
3875 <p>The output will be in the numeric form, using the token and symbol numbers generated by the <code>make regen-grammar</code> stage, stored in <code>Include/token.h</code>: </p>
3876 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pprint</span> <span class="k">import</span> <span class="n">pprint</span>
3877 <span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">parser</span>
3878 <span class="gp">&gt;&gt;&gt; </span><span class="n">st</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">expr</span><span class="p">(</span><span class="s1">&#39;a + 1&#39;</span><span class="p">)</span>
3879 <span class="gp">&gt;&gt;&gt; </span><span class="n">pprint</span><span class="p">(</span><span class="n">parser</span><span class="o">.</span><span class="n">st2list</span><span class="p">(</span><span class="n">st</span><span class="p">))</span>
3880 <span class="go">[258,</span>
3881 <span class="go"> [332,</span>
3882 <span class="go">  [306,</span>
3883 <span class="go">   [310,</span>
3884 <span class="go">    [311,</span>
3885 <span class="go">     [312,</span>
3886 <span class="go">      [313,</span>
3887 <span class="go">       [316,</span>
3888 <span class="go">        [317,</span>
3889 <span class="go">         [318,</span>
3890 <span class="go">          [319,</span>
3891 <span class="go">           [320,</span>
3892 <span class="go">            [321, [322, [323, [324, [325, [1, &#39;a&#39;]]]]]],</span>
3893 <span class="go">            [14, &#39;+&#39;],</span>
3894 <span class="go">            [321, [322, [323, [324, [325, [2, &#39;1&#39;]]]]]]]]]]]]]]]]],</span>
3895 <span class="go"> [4, &#39;&#39;],</span>
3896 <span class="go"> [0, &#39;&#39;]]</span>
3897 </pre></div>
3898 
3899 <p>To make it easier to understand, you can take all the numbers in the <code>symbol</code> and <code>token</code> modules, put them into a dictionary and recursively replace the values in the output of <code>parser.st2list()</code> with the names:</p>
3900 <div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">symbol</span>
3901 <span class="kn">import</span> <span class="nn">token</span>
3902 <span class="kn">import</span> <span class="nn">parser</span>
3903 
3904 <span class="k">def</span> <span class="nf">lex</span><span class="p">(</span><span class="n">expression</span><span class="p">):</span>
3905     <span class="n">symbols</span> <span class="o">=</span> <span class="p">{</span><span class="n">v</span><span class="p">:</span> <span class="n">k</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">symbol</span><span class="o">.</span><span class="vm">__dict__</span><span class="o">.</span><span class="n">items</span><span class="p">()</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="nb">int</span><span class="p">)}</span>
3906     <span class="n">tokens</span> <span class="o">=</span> <span class="p">{</span><span class="n">v</span><span class="p">:</span> <span class="n">k</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">token</span><span class="o">.</span><span class="vm">__dict__</span><span class="o">.</span><span class="n">items</span><span class="p">()</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="nb">int</span><span class="p">)}</span>
3907     <span class="n">lexicon</span> <span class="o">=</span> <span class="p">{</span><span class="o">**</span><span class="n">symbols</span><span class="p">,</span> <span class="o">**</span><span class="n">tokens</span><span class="p">}</span>
3908     <span class="n">st</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">expr</span><span class="p">(</span><span class="n">expression</span><span class="p">)</span>
3909     <span class="n">st_list</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">st2list</span><span class="p">(</span><span class="n">st</span><span class="p">)</span>
3910 
3911     <span class="k">def</span> <span class="nf">replace</span><span class="p">(</span><span class="n">l</span><span class="p">:</span> <span class="nb">list</span><span class="p">):</span>
3912         <span class="n">r</span> <span class="o">=</span> <span class="p">[]</span>
3913         <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">l</span><span class="p">:</span>
3914             <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="nb">list</span><span class="p">):</span>
3915                 <span class="n">r</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">replace</span><span class="p">(</span><span class="n">i</span><span class="p">))</span>
3916             <span class="k">else</span><span class="p">:</span>
3917                 <span class="k">if</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">lexicon</span><span class="p">:</span>
3918                     <span class="n">r</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">lexicon</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
3919                 <span class="k">else</span><span class="p">:</span>
3920                     <span class="n">r</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
3921         <span class="k">return</span> <span class="n">r</span>
3922 
3923     <span class="k">return</span> <span class="n">replace</span><span class="p">(</span><span class="n">st_list</span><span class="p">)</span>
3924 </pre></div>
3925 
3926 <p>You can run <code>lex()</code> with a simple expression, like <code>a + 1</code> to see how this is represented as a parser-tree:</p>
3927 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pprint</span> <span class="k">import</span> <span class="n">pprint</span>
3928 <span class="gp">&gt;&gt;&gt; </span><span class="n">pprint</span><span class="p">(</span><span class="n">lex</span><span class="p">(</span><span class="s1">&#39;a + 1&#39;</span><span class="p">))</span>
3929 
3930 <span class="go">[&#39;eval_input&#39;,</span>
3931 <span class="go"> [&#39;testlist&#39;,</span>
3932 <span class="go">  [&#39;test&#39;,</span>
3933 <span class="go">   [&#39;or_test&#39;,</span>
3934 <span class="go">    [&#39;and_test&#39;,</span>
3935 <span class="go">     [&#39;not_test&#39;,</span>
3936 <span class="go">      [&#39;comparison&#39;,</span>
3937 <span class="go">       [&#39;expr&#39;,</span>
3938 <span class="go">        [&#39;xor_expr&#39;,</span>
3939 <span class="go">         [&#39;and_expr&#39;,</span>
3940 <span class="go">          [&#39;shift_expr&#39;,</span>
3941 <span class="go">           [&#39;arith_expr&#39;,</span>
3942 <span class="go">            [&#39;term&#39;,</span>
3943 <span class="go">             [&#39;factor&#39;, [&#39;power&#39;, [&#39;atom_expr&#39;, [&#39;atom&#39;, [&#39;NAME&#39;, &#39;a&#39;]]]]]],</span>
3944 <span class="go">            [&#39;PLUS&#39;, &#39;+&#39;],</span>
3945 <span class="go">            [&#39;term&#39;,</span>
3946 <span class="go">             [&#39;factor&#39;,</span>
3947 <span class="go">              [&#39;power&#39;, [&#39;atom_expr&#39;, [&#39;atom&#39;, [&#39;NUMBER&#39;, &#39;1&#39;]]]]]]]]]]]]]]]]],</span>
3948 <span class="go"> [&#39;NEWLINE&#39;, &#39;&#39;],</span>
3949 <span class="go"> [&#39;ENDMARKER&#39;, &#39;&#39;]]</span>
3950 </pre></div>
3951 
3952 <p>In the output, you can see the symbols in lowercase, such as <code>'test'</code> and the tokens in uppercase, such as <code>'NUMBER'</code>.</p>
3953 <h3 id="abstract-syntax-trees">Abstract Syntax Trees</h3>
3954 <p>The next stage in the CPython interpreter is to convert the CST generated by the parser into something more logical that can be executed. The structure is a higher-level representation of the code, called an Abstract Syntax Tree (AST).</p>
3955 <p>ASTs are produced inline with the CPython interpreter process, but you can also generate them in both Python using the <code>ast</code> module in the Standard Library as well as through the C API.</p>
3956 <p>Before diving into the C implementation of the AST, it would be useful to understand what an AST looks like for a simple piece of Python code.</p>
3957 <p>To do this, here&rsquo;s a simple app called <code>instaviz</code> for this tutorial. It displays the AST and bytecode instructions (which we&rsquo;ll cover later) in a Web UI.</p>
3958 <p>To install <code>instaviz</code>:</p>
3959 <div class="highlight sh"><pre><span></span><span class="gp">$</span> pip install instaviz
3960 </pre></div>
3961 
3962 <p>Then, open up a REPL by running <code>python</code> at the command line with no arguments:</p>
3963 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">instaviz</span>
3964 <span class="gp">&gt;&gt;&gt; </span><span class="k">def</span> <span class="nf">example</span><span class="p">():</span>
3965 <span class="go">       a = 1</span>
3966 <span class="go">       b = a + 1</span>
3967 <span class="go">       return b</span>
3968 
3969 <span class="gp">&gt;&gt;&gt; </span><span class="n">instaviz</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">example</span><span class="p">)</span>
3970 </pre></div>
3971 
3972 <p>You&rsquo;ll see a notification on the command-line that a web server has started on port <code>8080</code>. If you were using that port for something else, you can change it by calling <code>instaviz.show(example, port=9090)</code> or another port number.</p>
3973 <p>In the web browser, you can see the detailed breakdown of your function:</p>
3974 <p><a href="https://files.realpython.com/media/screenshot.e148c89e3a9a.png" target="_blank"><img class="img-fluid mx-auto d-block border " src="https://files.realpython.com/media/screenshot.e148c89e3a9a.png" width="4802" height="2566" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/screenshot.e148c89e3a9a.png&amp;w=1200&amp;sig=eeb7b21839b90e3726ea5db7bfa0aac05730b1fe 1200w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/screenshot.e148c89e3a9a.png&amp;w=2401&amp;sig=aed4aed922bbe4ef9c665a90c9244ebaa950e032 2401w, https://files.realpython.com/media/screenshot.e148c89e3a9a.png 4802w" sizes="75vw" alt="Instaviz screenshot"/></a></p>
3975 <p>The bottom left graph is the function you declared in REPL, represented as an Abstract Syntax Tree. Each node in the tree is an AST type. They are found in the <code>ast</code> module, and all inherit from <code>_ast.AST</code>. </p>
3976 <p>Some of the nodes have properties which link them to child nodes, unlike the CST, which has a generic child node property. </p>
3977 <p>For example, if you click on the Assign node in the center, this links to the line <code>b = a + 1</code>:</p>
3978 <p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.17_pm.a5df8d873988.png" target="_blank"><img class="img-fluid mx-auto d-block border w-75" src="https://files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.17_pm.a5df8d873988.png" width="2226" height="1596" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.17_pm.a5df8d873988.png&amp;w=556&amp;sig=6a6f30034c85f411bd6c159df8aef50e899dda9c 556w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.17_pm.a5df8d873988.png&amp;w=1113&amp;sig=1b85cbf0bec4b114ef52d8a7bcdbfa08a1519237 1113w, https://files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.17_pm.a5df8d873988.png 2226w" sizes="75vw" alt="Instaviz screenshot 2"/></a></p>
3979 <p>It has two properties:</p>
3980 <ol>
3981 <li><strong><code>targets</code></strong> is a list of names to assign. It is a list because you can assign to multiple variables with a single expression using unpacking</li>
3982 <li><strong><code>value</code></strong> is the value to assign, which in this case is a <code>BinOp</code> statement, <code>a + 1</code>.</li>
3983 </ol>
3984 <p>If you click on the <code>BinOp</code> statement, it shows the properties of relevance:</p>
3985 <ul>
3986 <li><strong><code>left</code>:</strong> the node to the left of the operator</li>
3987 <li><strong><code>op</code>:</strong> the operator, in this case, an <code>Add</code> node (<code>+</code>) for addition</li>
3988 <li><strong><code>right</code>:</strong> the node to the right of the operator</li>
3989 </ul>
3990 <p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.37_pm.21a11b49a820.png" target="_blank"><img class="img-fluid mx-auto d-block border w-75" src="https://files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.37_pm.21a11b49a820.png" width="1708" height="932" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.37_pm.21a11b49a820.png&amp;w=427&amp;sig=381d8bee4cc98dc98031abc6bc34ec376fd452b7 427w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.37_pm.21a11b49a820.png&amp;w=854&amp;sig=ccc0b88e9d762b2298a33ca2a03e826ee13f4193 854w, https://files.realpython.com/media/Screen_Shot_2019-03-19_at_1.24.37_pm.21a11b49a820.png 1708w" sizes="75vw" alt="Instaviz screenshot 3"/></a></p>
3991 <p>Compiling an AST in C is not a straightforward task, so the <code>Python/ast.c</code> module is over 5000 lines of code.</p>
3992 <p>There are a few entry points, forming part of the AST&rsquo;s public API. In the last section on the lexer and parser, you stopped when you&rsquo;d reached the call to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L772"><code>PyAST_FromNodeObject()</code></a>. By this stage, the Python interpreter process had created a CST in the format of <code>node *</code> tree.</p>
3993 <p>Jumping then into <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L772"><code>PyAST_FromNodeObject()</code></a> inside <code>Python/ast.c</code>, you can see it receives the <code>node *</code> tree, the filename, compiler flags, and the <code>PyArena</code>.</p>
3994 <p>The return type from this function is <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/ast.h#L10"><code>mod_ty</code></a>, defined in <code>Include/Python-ast.h</code>. <code>mod_ty</code> is a container structure for one of the 5 module types in Python:</p>
3995 <ol>
3996 <li><code>Module</code> </li>
3997 <li><code>Interactive</code></li>
3998 <li><code>Expression</code></li>
3999 <li><code>FunctionType</code></li>
4000 <li><code>Suite</code></li>
4001 </ol>
4002 <p>In <code>Include/Python-ast.h</code> you can see that an <code>Expression</code> type requires a field <code>body</code>, which is an <code>expr_ty</code> type. The <code>expr_ty</code> type is also defined in <code>Include/Python-ast.h</code>:</p>
4003 <div class="highlight c"><pre><span></span><span class="k">enum</span> <span class="n">_mod_kind</span> <span class="p">{</span><span class="n">Module_kind</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">Interactive_kind</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">Expression_kind</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span>
4004                  <span class="n">FunctionType_kind</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">Suite_kind</span><span class="o">=</span><span class="mi">5</span><span class="p">};</span>
4005 <span class="k">struct</span> <span class="n">_mod</span> <span class="p">{</span>
4006     <span class="k">enum</span> <span class="n">_mod_kind</span> <span class="n">kind</span><span class="p">;</span>
4007     <span class="k">union</span> <span class="p">{</span>
4008         <span class="k">struct</span> <span class="p">{</span>
4009             <span class="n">asdl_seq</span> <span class="o">*</span><span class="n">body</span><span class="p">;</span>
4010             <span class="n">asdl_seq</span> <span class="o">*</span><span class="n">type_ignores</span><span class="p">;</span>
4011         <span class="p">}</span> <span class="n">Module</span><span class="p">;</span>
4012 
4013         <span class="k">struct</span> <span class="p">{</span>
4014             <span class="n">asdl_seq</span> <span class="o">*</span><span class="n">body</span><span class="p">;</span>
4015         <span class="p">}</span> <span class="n">Interactive</span><span class="p">;</span>
4016 
4017         <span class="k">struct</span> <span class="p">{</span>
4018             <span class="n">expr_ty</span> <span class="n">body</span><span class="p">;</span>
4019         <span class="p">}</span> <span class="n">Expression</span><span class="p">;</span>
4020 
4021         <span class="k">struct</span> <span class="p">{</span>
4022             <span class="n">asdl_seq</span> <span class="o">*</span><span class="n">argtypes</span><span class="p">;</span>
4023             <span class="n">expr_ty</span> <span class="n">returns</span><span class="p">;</span>
4024         <span class="p">}</span> <span class="n">FunctionType</span><span class="p">;</span>
4025 
4026         <span class="k">struct</span> <span class="p">{</span>
4027             <span class="n">asdl_seq</span> <span class="o">*</span><span class="n">body</span><span class="p">;</span>
4028         <span class="p">}</span> <span class="n">Suite</span><span class="p">;</span>
4029 
4030     <span class="p">}</span> <span class="n">v</span><span class="p">;</span>
4031 <span class="p">};</span>
4032 </pre></div>
4033 
4034 <p>The AST types are all listed in <code>Parser/Python.asdl</code>. You will see the module types, statement types, expression types, operators, and comprehensions all listed. The names of the types in this document relate to the classes generated by the AST and the same classes named in the <code>ast</code> standard module library.</p>
4035 <p>The parameters and names in <code>Include/Python-ast.h</code> correlate directly to those specified in <code>Parser/Python.asdl</code>:</p>
4036 <div class="highlight text"><pre><span></span>-- ASDL&#39;s 5 builtin types are:
4037 -- identifier, int, string, object, constant
4038 
4039 module Python
4040 {
4041     mod = Module(stmt* body, type_ignore *type_ignores)
4042         | Interactive(stmt* body)
4043 <span class="hll">        | Expression(expr body)
4044 </span>        | FunctionType(expr* argtypes, expr returns)
4045 </pre></div>
4046 
4047 <p>The C header file and structures are there so that the <code>Python/ast.c</code> program can quickly generate the structures with pointers to the relevant data.</p>
4048 <p>Looking at <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L772"><code>PyAST_FromNodeObject()</code></a> you can see that it is essentially a <code>switch</code> statement around the result from <code>TYPE(n)</code>. <code>TYPE()</code> is one of the core functions used by the AST to determine what type a node in the concrete syntax tree is. In the case of <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L772"><code>PyAST_FromNodeObject()</code></a> it&rsquo;s just looking at the first node, so it can only be one of the module types defined as <code>Module</code>, <code>Interactive</code>, <code>Expression</code>, <code>FunctionType</code>.</p>
4049 <p>The result of <code>TYPE()</code> will be either a symbol or token type, which we&rsquo;re very familiar with by this stage.</p>
4050 <p>For <code>file_input</code>, the results should be a <code>Module</code>. Modules are a series of statements, of which there are a few types. The logic to traverse the children of <code>n</code> and create statement nodes is within <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L4512"><code>ast_for_stmt()</code></a>. This function is called either once, if there is only 1 statement in the module, or in a loop if there are many. The resulting <code>Module</code> is then returned with the <code>PyArena</code>.</p>
4051 <p>For <code>eval_input</code>, the result should be an <code>Expression</code>. The result from <code>CHILD(n ,0)</code>, which is the first child of <code>n</code> is passed to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L3246"><code>ast_for_testlist()</code></a> which returns an <code>expr_ty</code> type. This <code>expr_ty</code> is sent to <code>Expression()</code> with the PyArena to create an expression node, and then passed back as a result:</p>
4052 <div class="highlight c"><pre><span></span><span class="n">mod_ty</span>
4053 <span class="nf">PyAST_FromNodeObject</span><span class="p">(</span><span class="k">const</span> <span class="n">node</span> <span class="o">*</span><span class="n">n</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">,</span>
4054                      <span class="n">PyObject</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span> <span class="n">PyArena</span> <span class="o">*</span><span class="n">arena</span><span class="p">)</span>
4055 <span class="p">{</span>
4056     <span class="p">...</span>
4057     <span class="k">switch</span> <span class="p">(</span><span class="n">TYPE</span><span class="p">(</span><span class="n">n</span><span class="p">))</span> <span class="p">{</span>
4058         <span class="k">case</span> <span class="nl">file_input</span><span class="p">:</span>
4059             <span class="n">stmts</span> <span class="o">=</span> <span class="n">_Py_asdl_seq_new</span><span class="p">(</span><span class="n">num_stmts</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="n">arena</span><span class="p">);</span>
4060             <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">stmts</span><span class="p">)</span>
4061                 <span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
4062             <span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">NCH</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
4063                 <span class="n">ch</span> <span class="o">=</span> <span class="n">CHILD</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">i</span><span class="p">);</span>
4064                 <span class="k">if</span> <span class="p">(</span><span class="n">TYPE</span><span class="p">(</span><span class="n">ch</span><span class="p">)</span> <span class="o">==</span> <span class="n">NEWLINE</span><span class="p">)</span>
4065                     <span class="k">continue</span><span class="p">;</span>
4066                 <span class="n">REQ</span><span class="p">(</span><span class="n">ch</span><span class="p">,</span> <span class="n">stmt</span><span class="p">);</span>
4067                 <span class="n">num</span> <span class="o">=</span> <span class="n">num_stmts</span><span class="p">(</span><span class="n">ch</span><span class="p">);</span>
4068                 <span class="k">if</span> <span class="p">(</span><span class="n">num</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
4069 <span class="hll">                    <span class="n">s</span> <span class="o">=</span> <span class="n">ast_for_stmt</span><span class="p">(</span><span class="o">&amp;</span><span class="n">c</span><span class="p">,</span> <span class="n">ch</span><span class="p">);</span>
4070 </span>                    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">s</span><span class="p">)</span>
4071                         <span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
4072                     <span class="n">asdl_seq_SET</span><span class="p">(</span><span class="n">stmts</span><span class="p">,</span> <span class="n">k</span><span class="o">++</span><span class="p">,</span> <span class="n">s</span><span class="p">);</span>
4073                 <span class="p">}</span>
4074                 <span class="k">else</span> <span class="p">{</span>
4075                     <span class="n">ch</span> <span class="o">=</span> <span class="n">CHILD</span><span class="p">(</span><span class="n">ch</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
4076                     <span class="n">REQ</span><span class="p">(</span><span class="n">ch</span><span class="p">,</span> <span class="n">simple_stmt</span><span class="p">);</span>
4077                     <span class="k">for</span> <span class="p">(</span><span class="n">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">j</span> <span class="o">&lt;</span> <span class="n">num</span><span class="p">;</span> <span class="n">j</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
4078 <span class="hll">                        <span class="n">s</span> <span class="o">=</span> <span class="n">ast_for_stmt</span><span class="p">(</span><span class="o">&amp;</span><span class="n">c</span><span class="p">,</span> <span class="n">CHILD</span><span class="p">(</span><span class="n">ch</span><span class="p">,</span> <span class="n">j</span> <span class="o">*</span> <span class="mi">2</span><span class="p">));</span>
4079 </span>                        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">s</span><span class="p">)</span>
4080                             <span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
4081                         <span class="n">asdl_seq_SET</span><span class="p">(</span><span class="n">stmts</span><span class="p">,</span> <span class="n">k</span><span class="o">++</span><span class="p">,</span> <span class="n">s</span><span class="p">);</span>
4082                     <span class="p">}</span>
4083                 <span class="p">}</span>
4084             <span class="p">}</span>
4085 
4086             <span class="cm">/* Type ignores are stored under the ENDMARKER in file_input. */</span>
4087             <span class="p">...</span>
4088 
4089 <span class="hll">            <span class="n">res</span> <span class="o">=</span> <span class="n">Module</span><span class="p">(</span><span class="n">stmts</span><span class="p">,</span> <span class="n">type_ignores</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
4090 </span>            <span class="k">break</span><span class="p">;</span>
4091         <span class="k">case</span> <span class="nl">eval_input</span><span class="p">:</span> <span class="p">{</span>
4092             <span class="n">expr_ty</span> <span class="n">testlist_ast</span><span class="p">;</span>
4093 
4094             <span class="cm">/* XXX Why not comp_for here? */</span>
4095 <span class="hll">            <span class="n">testlist_ast</span> <span class="o">=</span> <span class="n">ast_for_testlist</span><span class="p">(</span><span class="o">&amp;</span><span class="n">c</span><span class="p">,</span> <span class="n">CHILD</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="mi">0</span><span class="p">));</span>
4096 </span>            <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">testlist_ast</span><span class="p">)</span>
4097                 <span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
4098 <span class="hll">            <span class="n">res</span> <span class="o">=</span> <span class="n">Expression</span><span class="p">(</span><span class="n">testlist_ast</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
4099 </span>            <span class="k">break</span><span class="p">;</span>
4100         <span class="p">}</span>
4101         <span class="k">case</span> <span class="nl">single_input</span><span class="p">:</span>
4102             <span class="p">...</span>
4103             <span class="k">break</span><span class="p">;</span>
4104         <span class="k">case</span> <span class="nl">func_type_input</span><span class="p">:</span>
4105             <span class="p">...</span>
4106         <span class="p">...</span>
4107     <span class="k">return</span> <span class="n">res</span><span class="p">;</span>
4108 <span class="p">}</span>
4109 </pre></div>
4110 
4111 <p>Inside the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L4512"><code>ast_for_stmt()</code></a> function, there is another <code>switch</code> statement for each possible statement type (<code>simple_stmt</code>, <code>compound_stmt</code>, and so on) and the code to determine the arguments to the node class.</p>
4112 <p>One of the simpler functions is for the power expression, i.e., <code>2**4</code> is 2 to the power of 4. This function starts by getting the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ast.c#L2770"><code>ast_for_atom_expr()</code></a>, which is the number <code>2</code> in our example, then if that has one child, it returns the atomic expression. If it has more than one child, it will get the right-hand (the number <code>4</code>) and return a <code>BinOp</code> (binary operation) with the operator as <code>Pow</code> (power), the left hand of <code>e</code> (2), and the right hand of <code>f</code> (4):</p>
4113 <div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">expr_ty</span>
4114 <span class="nf">ast_for_power</span><span class="p">(</span><span class="k">struct</span> <span class="n">compiling</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="k">const</span> <span class="n">node</span> <span class="o">*</span><span class="n">n</span><span class="p">)</span>
4115 <span class="p">{</span>
4116     <span class="cm">/* power: atom trailer* (&#39;**&#39; factor)*</span>
4117 <span class="cm">     */</span>
4118     <span class="n">expr_ty</span> <span class="n">e</span><span class="p">;</span>
4119     <span class="n">REQ</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">power</span><span class="p">);</span>
4120     <span class="n">e</span> <span class="o">=</span> <span class="n">ast_for_atom_expr</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">CHILD</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="mi">0</span><span class="p">));</span>
4121     <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">e</span><span class="p">)</span>
4122         <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
4123     <span class="k">if</span> <span class="p">(</span><span class="n">NCH</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span>
4124         <span class="k">return</span> <span class="n">e</span><span class="p">;</span>
4125     <span class="k">if</span> <span class="p">(</span><span class="n">TYPE</span><span class="p">(</span><span class="n">CHILD</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">NCH</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))</span> <span class="o">==</span> <span class="n">factor</span><span class="p">)</span> <span class="p">{</span>
4126         <span class="n">expr_ty</span> <span class="n">f</span> <span class="o">=</span> <span class="n">ast_for_expr</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">CHILD</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">NCH</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">));</span>
4127         <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">f</span><span class="p">)</span>
4128             <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
4129         <span class="n">e</span> <span class="o">=</span> <span class="n">BinOp</span><span class="p">(</span><span class="n">e</span><span class="p">,</span> <span class="n">Pow</span><span class="p">,</span> <span class="n">f</span><span class="p">,</span> <span class="n">LINENO</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="n">n</span><span class="o">-&gt;</span><span class="n">n_col_offset</span><span class="p">,</span>
4130                   <span class="n">n</span><span class="o">-&gt;</span><span class="n">n_end_lineno</span><span class="p">,</span> <span class="n">n</span><span class="o">-&gt;</span><span class="n">n_end_col_offset</span><span class="p">,</span> <span class="n">c</span><span class="o">-&gt;</span><span class="n">c_arena</span><span class="p">);</span>
4131     <span class="p">}</span>
4132     <span class="k">return</span> <span class="n">e</span><span class="p">;</span>
4133 <span class="p">}</span>
4134 </pre></div>
4135 
4136 <p>You can see the result of this if you send a short function to the <code>instaviz</code> module:</p>
4137 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
4138 <span class="go">       2**4</span>
4139 <span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">instaviz</span>
4140 <span class="gp">&gt;&gt;&gt; </span><span class="n">instaviz</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">foo</span><span class="p">)</span>
4141 </pre></div>
4142 
4143 <p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-19_at_5.34.51_pm.c3a1e8d717f5.png" target="_blank"><img class="img-fluid mx-auto d-block border w-75" src="https://files.realpython.com/media/Screen_Shot_2019-03-19_at_5.34.51_pm.c3a1e8d717f5.png" width="1708" height="1094" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-19_at_5.34.51_pm.c3a1e8d717f5.png&amp;w=427&amp;sig=68167794b71cec6447aa8fcb4d22ca738a2e2ce4 427w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-19_at_5.34.51_pm.c3a1e8d717f5.png&amp;w=854&amp;sig=feeef6a7ce0b14d39c57956f77320125bcce43bf 854w, https://files.realpython.com/media/Screen_Shot_2019-03-19_at_5.34.51_pm.c3a1e8d717f5.png 1708w" sizes="75vw" alt="Instaviz screenshot 4"/></a></p>
4144 <p>In the UI you can also see the corresponding properties:</p>
4145 <p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-19_at_5.36.34_pm.0067235460de.png" target="_blank"><img class="img-fluid mx-auto d-block border w-75" src="https://files.realpython.com/media/Screen_Shot_2019-03-19_at_5.36.34_pm.0067235460de.png" width="1708" height="630" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-19_at_5.36.34_pm.0067235460de.png&amp;w=427&amp;sig=91663308ef854874e262756b2a28e598d263fff5 427w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-19_at_5.36.34_pm.0067235460de.png&amp;w=854&amp;sig=fbfcdd33bcf685f91ebe0559b4135ead2b296d7e 854w, https://files.realpython.com/media/Screen_Shot_2019-03-19_at_5.36.34_pm.0067235460de.png 1708w" sizes="75vw" alt="Instaviz screenshot 5"/></a></p>
4146 <p>In summary, each statement type and expression has a corresponding <code>ast_for_*()</code> function to create it. The arguments are defined in <code>Parser/Python.asdl</code> and exposed via the <code>ast</code> module in the standard library. If an expression or statement has children, then it will call the corresponding <code>ast_for_*</code> child function in a depth-first traversal.</p>
4147 <h3 id="conclusion_1">Conclusion</h3>
4148 <p>CPython&rsquo;s versatility and low-level execution API make it the ideal candidate for an embedded scripting engine. You will see CPython used in many UI applications, such as Game Design, 3D graphics and system automation. </p>
4149 <p>The interpreter process is flexible and efficient, and now you have an understanding of how it works you&rsquo;re ready to understand the compiler.</p>
4150 <h2 h1="h1" id="part-3-the-cpython-compiler-and-execution-loop">Part 3: The CPython Compiler and Execution Loop</h2>
4151 <p>In Part 2, you saw how the CPython interpreter takes an input, such as a file or string, and converts it into a logical Abstract Syntax Tree. We&rsquo;re still not at the stage where this code can be executed. Next, we have to go deeper to convert the Abstract Syntax Tree into a set of sequential commands that the CPU can understand. </p>
4152 <h3 id="compiling">Compiling</h3>
4153 <p>Now the interpreter has an AST with the properties required for each of the operations, functions, classes, and namespaces. It is the job of the compiler to turn the AST into something the CPU can understand.</p>
4154 <p>This compilation task is split into 2 parts:</p>
4155 <ol>
4156 <li>Traverse the tree and create a control-flow-graph, which represents the logical sequence for execution</li>
4157 <li>Convert the nodes in the CFG to smaller, executable statements, known as byte-code</li>
4158 </ol>
4159 <p>Earlier, we were looking at how files are executed, and the <code>PyRun_FileExFlags()</code> function in <code>Python/pythonrun.c</code>. Inside this function, we converted the <code>FILE</code> handle into a <code>mod</code>, of type <code>mod_ty</code>. This task was completed by <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1369"><code>PyParser_ASTFromFileObject()</code></a>, which in turns calls the <code>tokenizer</code>, <code>parser-tokenizer</code> and then the AST:</p>
4160 <div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span>
4161 <span class="nf">PyRun_FileExFlags</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">fp</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">filename_str</span><span class="p">,</span> <span class="kt">int</span> <span class="n">start</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">globals</span><span class="p">,</span>
4162                   <span class="n">PyObject</span> <span class="o">*</span><span class="n">locals</span><span class="p">,</span> <span class="kt">int</span> <span class="n">closeit</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">)</span>
4163 <span class="p">{</span>
4164  <span class="p">...</span>
4165 <span class="hll">    <span class="n">mod</span> <span class="o">=</span> <span class="n">PyParser_ASTFromFileObject</span><span class="p">(</span><span class="n">fp</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">start</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span>
4166 </span> <span class="p">...</span>
4167 <span class="hll">    <span class="n">ret</span> <span class="o">=</span> <span class="n">run_mod</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">globals</span><span class="p">,</span> <span class="n">locals</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
4168 </span><span class="p">}</span>
4169 </pre></div>
4170 
4171 <p>The resulting module from the call to is sent to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1125"><code>run_mod()</code></a> still in <code>Python/pythonrun.c</code>. This is a small function that gets a <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/code.h#L69"><code>PyCodeObject</code></a> from <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L312"><code>PyAST_CompileObject()</code></a> and sends it on to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1094"><code>run_eval_code_obj()</code></a>. You will tackle <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1094"><code>run_eval_code_obj()</code></a> in the next section:</p>
4172 <div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyObject</span> <span class="o">*</span>
4173 <span class="nf">run_mod</span><span class="p">(</span><span class="n">mod_ty</span> <span class="n">mod</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">globals</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">locals</span><span class="p">,</span>
4174             <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">,</span> <span class="n">PyArena</span> <span class="o">*</span><span class="n">arena</span><span class="p">)</span>
4175 <span class="p">{</span>
4176     <span class="n">PyCodeObject</span> <span class="o">*</span><span class="n">co</span><span class="p">;</span>
4177     <span class="n">PyObject</span> <span class="o">*</span><span class="n">v</span><span class="p">;</span>
4178 <span class="hll">    <span class="n">co</span> <span class="o">=</span> <span class="n">PyAST_CompileObject</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">arena</span><span class="p">);</span>
4179 </span>    <span class="k">if</span> <span class="p">(</span><span class="n">co</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
4180         <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
4181 
4182     <span class="k">if</span> <span class="p">(</span><span class="n">PySys_Audit</span><span class="p">(</span><span class="s">&quot;exec&quot;</span><span class="p">,</span> <span class="s">&quot;O&quot;</span><span class="p">,</span> <span class="n">co</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
4183         <span class="n">Py_DECREF</span><span class="p">(</span><span class="n">co</span><span class="p">);</span>
4184         <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
4185     <span class="p">}</span>
4186 
4187 <span class="hll">    <span class="n">v</span> <span class="o">=</span> <span class="n">run_eval_code_obj</span><span class="p">(</span><span class="n">co</span><span class="p">,</span> <span class="n">globals</span><span class="p">,</span> <span class="n">locals</span><span class="p">);</span>
4188 </span>    <span class="n">Py_DECREF</span><span class="p">(</span><span class="n">co</span><span class="p">);</span>
4189     <span class="k">return</span> <span class="n">v</span><span class="p">;</span>
4190 <span class="p">}</span>
4191 </pre></div>
4192 
4193 <p>The <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L312"><code>PyAST_CompileObject()</code></a> function is the main entry point to the CPython compiler. It takes a Python module as its primary argument, along with the name of the file, the globals, locals, and the <code>PyArena</code> all created earlier in the interpreter process.</p>
4194 <p>We&rsquo;re starting to get into the guts of the CPython compiler now, with decades of development and Computer Science theory behind it. Don&rsquo;t be put off by the language. Once we break down the compiler into logical steps, it&rsquo;ll make sense.</p>
4195 <p>Before the compiler starts, a global compiler state is created. This type, <code>compiler</code> is defined in <code>Python/compile.c</code> and contains properties used by the compiler to remember the compiler flags, the stack, and the <code>PyArena</code>:</p>
4196 <div class="highlight c"><pre><span></span><span class="k">struct</span> <span class="n">compiler</span> <span class="p">{</span>
4197     <span class="n">PyObject</span> <span class="o">*</span><span class="n">c_filename</span><span class="p">;</span>
4198     <span class="k">struct</span> <span class="n">symtable</span> <span class="o">*</span><span class="n">c_st</span><span class="p">;</span>
4199     <span class="n">PyFutureFeatures</span> <span class="o">*</span><span class="n">c_future</span><span class="p">;</span> <span class="cm">/* pointer to module&#39;s __future__ */</span>
4200     <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">c_flags</span><span class="p">;</span>
4201 
4202     <span class="kt">int</span> <span class="n">c_optimize</span><span class="p">;</span>              <span class="cm">/* optimization level */</span>
4203     <span class="kt">int</span> <span class="n">c_interactive</span><span class="p">;</span>           <span class="cm">/* true if in interactive mode */</span>
4204     <span class="kt">int</span> <span class="n">c_nestlevel</span><span class="p">;</span>
4205     <span class="kt">int</span> <span class="n">c_do_not_emit_bytecode</span><span class="p">;</span>  <span class="cm">/* The compiler won&#39;t emit any bytecode</span>
4206 <span class="cm">                                    if this value is different from zero.</span>
4207 <span class="cm">                                    This can be used to temporarily visit</span>
4208 <span class="cm">                                    nodes without emitting bytecode to</span>
4209 <span class="cm">                                    check only errors. */</span>
4210 
4211     <span class="n">PyObject</span> <span class="o">*</span><span class="n">c_const_cache</span><span class="p">;</span>     <span class="cm">/* Python dict holding all constants,</span>
4212 <span class="cm">                                    including names tuple */</span>
4213     <span class="k">struct</span> <span class="n">compiler_unit</span> <span class="o">*</span><span class="n">u</span><span class="p">;</span> <span class="cm">/* compiler state for current block */</span>
4214     <span class="n">PyObject</span> <span class="o">*</span><span class="n">c_stack</span><span class="p">;</span>           <span class="cm">/* Python list holding compiler_unit ptrs */</span>
4215     <span class="n">PyArena</span> <span class="o">*</span><span class="n">c_arena</span><span class="p">;</span>            <span class="cm">/* pointer to memory allocation arena */</span>
4216 <span class="p">};</span>
4217 </pre></div>
4218 
4219 <p>Inside <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L312"><code>PyAST_CompileObject()</code></a>, there are 11 main steps happening:</p>
4220 <ol>
4221 <li>Create an empty <code>__doc__</code> property to the module if it doesn&rsquo;t exist.</li>
4222 <li>Create an empty <code>__annotations__</code> property to the module if it doesn&rsquo;t exist.</li>
4223 <li>Set the filename of the global compiler state to the filename argument.</li>
4224 <li>Set the memory allocation arena for the compiler to the one used by the interpreter.</li>
4225 <li>Copy any <code>__future__</code> flags in the module to the future flags in the compiler.</li>
4226 <li>Merge runtime flags provided by the command-line or environment variables.</li>
4227 <li>Enable any <code>__future__</code> features in the compiler.</li>
4228 <li>Set the optimization level to the provided argument, or default.</li>
4229 <li>Build a symbol table from the module object.</li>
4230 <li>Run the compiler with the compiler state and return the code object.</li>
4231 <li>Free any allocated memory by the compiler.</li>
4232 </ol>
4233 <div class="highlight c"><pre><span></span><span class="n">PyCodeObject</span> <span class="o">*</span>
4234 <span class="nf">PyAST_CompileObject</span><span class="p">(</span><span class="n">mod_ty</span> <span class="n">mod</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span> <span class="n">PyCompilerFlags</span> <span class="o">*</span><span class="n">flags</span><span class="p">,</span>
4235                    <span class="kt">int</span> <span class="n">optimize</span><span class="p">,</span> <span class="n">PyArena</span> <span class="o">*</span><span class="n">arena</span><span class="p">)</span>
4236 <span class="p">{</span>
4237     <span class="k">struct</span> <span class="n">compiler</span> <span class="n">c</span><span class="p">;</span>
4238     <span class="n">PyCodeObject</span> <span class="o">*</span><span class="n">co</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
4239     <span class="n">PyCompilerFlags</span> <span class="n">local_flags</span> <span class="o">=</span> <span class="n">_PyCompilerFlags_INIT</span><span class="p">;</span>
4240     <span class="kt">int</span> <span class="n">merged</span><span class="p">;</span>
4241     <span class="n">PyConfig</span> <span class="o">*</span><span class="n">config</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">_PyInterpreterState_GET_UNSAFE</span><span class="p">()</span><span class="o">-&gt;</span><span class="n">config</span><span class="p">;</span>
4242 <span class="hll">
4243 </span>    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">__doc__</span><span class="p">)</span> <span class="p">{</span>
4244         <span class="n">__doc__</span> <span class="o">=</span> <span class="n">PyUnicode_InternFromString</span><span class="p">(</span><span class="s">&quot;__doc__&quot;</span><span class="p">);</span>
4245         <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">__doc__</span><span class="p">)</span>
4246             <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
4247     <span class="p">}</span>
4248 <span class="hll">    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">__annotations__</span><span class="p">)</span> <span class="p">{</span>
4249 </span>        <span class="n">__annotations__</span> <span class="o">=</span> <span class="n">PyUnicode_InternFromString</span><span class="p">(</span><span class="s">&quot;__annotations__&quot;</span><span class="p">);</span>
4250         <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">__annotations__</span><span class="p">)</span>
4251             <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
4252     <span class="p">}</span>
4253     <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">compiler_init</span><span class="p">(</span><span class="o">&amp;</span><span class="n">c</span><span class="p">))</span>
4254         <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
4255 <span class="hll">    <span class="n">Py_INCREF</span><span class="p">(</span><span class="n">filename</span><span class="p">);</span>
4256 </span><span class="hll">    <span class="n">c</span><span class="p">.</span><span class="n">c_filename</span> <span class="o">=</span> <span class="n">filename</span><span class="p">;</span>
4257 </span><span class="hll">    <span class="n">c</span><span class="p">.</span><span class="n">c_arena</span> <span class="o">=</span> <span class="n">arena</span><span class="p">;</span>
4258 </span>    <span class="n">c</span><span class="p">.</span><span class="n">c_future</span> <span class="o">=</span> <span class="n">PyFuture_FromASTObject</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">filename</span><span class="p">);</span>
4259     <span class="k">if</span> <span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">c_future</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
4260         <span class="k">goto</span> <span class="n">finally</span><span class="p">;</span>
4261     <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">flags</span><span class="p">)</span> <span class="p">{</span>
4262         <span class="n">flags</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">local_flags</span><span class="p">;</span>
4263 <span class="hll">    <span class="p">}</span>
4264 </span><span class="hll">    <span class="n">merged</span> <span class="o">=</span> <span class="n">c</span><span class="p">.</span><span class="n">c_future</span><span class="o">-&gt;</span><span class="n">ff_features</span> <span class="o">|</span> <span class="n">flags</span><span class="o">-&gt;</span><span class="n">cf_flags</span><span class="p">;</span>
4265 </span>    <span class="n">c</span><span class="p">.</span><span class="n">c_future</span><span class="o">-&gt;</span><span class="n">ff_features</span> <span class="o">=</span> <span class="n">merged</span><span class="p">;</span>
4266     <span class="n">flags</span><span class="o">-&gt;</span><span class="n">cf_flags</span> <span class="o">=</span> <span class="n">merged</span><span class="p">;</span>
4267 <span class="hll">    <span class="n">c</span><span class="p">.</span><span class="n">c_flags</span> <span class="o">=</span> <span class="n">flags</span><span class="p">;</span>
4268 </span>    <span class="n">c</span><span class="p">.</span><span class="n">c_optimize</span> <span class="o">=</span> <span class="p">(</span><span class="n">optimize</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">?</span> <span class="n">config</span><span class="o">-&gt;</span><span class="nl">optimization_level</span> <span class="p">:</span> <span class="n">optimize</span><span class="p">;</span>
4269     <span class="n">c</span><span class="p">.</span><span class="n">c_nestlevel</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
4270     <span class="n">c</span><span class="p">.</span><span class="n">c_do_not_emit_bytecode</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
4271 
4272     <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">_PyAST_Optimize</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">arena</span><span class="p">,</span> <span class="n">c</span><span class="p">.</span><span class="n">c_optimize</span><span class="p">))</span> <span class="p">{</span>
4273         <span class="k">goto</span> <span class="n">finally</span><span class="p">;</span>
4274 <span class="hll">    <span class="p">}</span>
4275 </span>
4276     <span class="n">c</span><span class="p">.</span><span class="n">c_st</span> <span class="o">=</span> <span class="n">PySymtable_BuildObject</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">c</span><span class="p">.</span><span class="n">c_future</span><span class="p">);</span>
4277     <span class="k">if</span> <span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">c_st</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
4278         <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">PyErr_Occurred</span><span class="p">())</span>
4279             <span class="n">PyErr_SetString</span><span class="p">(</span><span class="n">PyExc_SystemError</span><span class="p">,</span> <span class="s">&quot;no symtable&quot;</span><span class="p">);</span>
4280         <span class="k">goto</span> <span class="n">finally</span><span class="p">;</span>
4281 <span class="hll">    <span class="p">}</span>
4282 </span>
4283     <span class="n">co</span> <span class="o">=</span> <span class="n">compiler_mod</span><span class="p">(</span><span class="o">&amp;</span><span class="n">c</span><span class="p">,</span> <span class="n">mod</span><span class="p">);</span>
4284 
4285  <span class="nl">finally</span><span class="p">:</span>
4286     <span class="n">compiler_free</span><span class="p">(</span><span class="o">&amp;</span><span class="n">c</span><span class="p">);</span>
4287     <span class="n">assert</span><span class="p">(</span><span class="n">co</span> <span class="o">||</span> <span class="n">PyErr_Occurred</span><span class="p">());</span>
4288     <span class="k">return</span> <span class="n">co</span><span class="p">;</span>
4289 <span class="p">}</span>
4290 </pre></div>
4291 
4292 <h4 id="future-flags-and-compiler-flags">Future Flags and Compiler Flags</h4>
4293 <p>Before the compiler runs, there are two types of flags to toggle the features inside the compiler. These come from two places:</p>
4294 <ol>
4295 <li>The interpreter state, which may have been command-line options, set in <code>pyconfig.h</code> or via environment variables</li>
4296 <li>The use of <code>__future__</code> statements inside the actual source code of the module</li>
4297 </ol>
4298 <p>To distinguish the two types of flags, think that the <code>__future__</code> flags are required because of the syntax or features in that specific module. For example, Python 3.7 introduced delayed evaluation of type hints through the <code>annotations</code> future flag:</p>
4299 <div class="highlight python"><pre><span></span><span class="kn">from</span> <span class="nn">__future__</span> <span class="k">import</span> <span class="n">annotations</span>
4300 </pre></div>
4301 
4302 <p>The code after this statement might use unresolved type hints, so the <code>__future__</code> statement is required. Otherwise, the module wouldn&rsquo;t import. It would be unmaintainable to manually request that the person importing the module enable this specific compiler flag.</p>
4303 <p>The other compiler flags are specific to the environment, so they might change the way the code executes or the way the compiler runs, but they shouldn&rsquo;t link to the source in the same way that <code>__future__</code> statements do.</p>
4304 <p>One example of a compiler flag would be the <a href="https://docs.python.org/3/using/cmdline.html#cmdoption-o"><code>-O</code> flag for optimizing the use of <code>assert</code> statements</a>. This flag disables any <code>assert</code> statements, which may have been put in the code for <a href="https://realpython.com/python-debugging-pdb/">debugging purposes</a>.
4305 It can also be enabled with the <code>PYTHONOPTIMIZE=1</code> environment variable setting.</p>
4306 <h4 id="symbol-tables">Symbol Tables</h4>
4307 <p>In <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L312"><code>PyAST_CompileObject()</code></a> there was a reference to a <code>symtable</code> and a call to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/symtable.c#L262"><code>PySymtable_BuildObject()</code></a> with the module to be executed.</p>
4308 <p>The purpose of the symbol table is to provide a list of namespaces, globals, and locals for the compiler to use for referencing and resolving scopes.</p>
4309 <p>The <code>symtable</code> structure in <code>Include/symtable.h</code> is well documented, so it&rsquo;s clear what each of the fields is for. There should be one symtable instance for the compiler, so namespacing becomes essential. </p>
4310 <p>If you create a function called <code>resolve_names()</code> in one module and declare another function with the same name in another module, you want to be sure which one is called. The symtable serves this purpose, as well as ensuring that variables declared within a narrow scope don&rsquo;t automatically become globals (after all, this isn&rsquo;t JavaScript):</p>
4311 <div class="highlight c"><pre><span></span><span class="k">struct</span> <span class="n">symtable</span> <span class="p">{</span>
4312     <span class="n">PyObject</span> <span class="o">*</span><span class="n">st_filename</span><span class="p">;</span>          <span class="cm">/* name of file being compiled,</span>
4313 <span class="cm">                                       decoded from the filesystem encoding */</span>
4314     <span class="k">struct</span> <span class="n">_symtable_entry</span> <span class="o">*</span><span class="n">st_cur</span><span class="p">;</span> <span class="cm">/* current symbol table entry */</span>
4315     <span class="k">struct</span> <span class="n">_symtable_entry</span> <span class="o">*</span><span class="n">st_top</span><span class="p">;</span> <span class="cm">/* symbol table entry for module */</span>
4316     <span class="n">PyObject</span> <span class="o">*</span><span class="n">st_blocks</span><span class="p">;</span>            <span class="cm">/* dict: map AST node addresses</span>
4317 <span class="cm">                                     *       to symbol table entries */</span>
4318     <span class="n">PyObject</span> <span class="o">*</span><span class="n">st_stack</span><span class="p">;</span>             <span class="cm">/* list: stack of namespace info */</span>
4319     <span class="n">PyObject</span> <span class="o">*</span><span class="n">st_global</span><span class="p">;</span>            <span class="cm">/* borrowed ref to st_top-&gt;ste_symbols */</span>
4320     <span class="kt">int</span> <span class="n">st_nblocks</span><span class="p">;</span>                 <span class="cm">/* number of blocks used. kept for</span>
4321 <span class="cm">                                       consistency with the corresponding</span>
4322 <span class="cm">                                       compiler structure */</span>
4323     <span class="n">PyObject</span> <span class="o">*</span><span class="n">st_private</span><span class="p">;</span>           <span class="cm">/* name of current class or NULL */</span>
4324     <span class="n">PyFutureFeatures</span> <span class="o">*</span><span class="n">st_future</span><span class="p">;</span>    <span class="cm">/* module&#39;s future features that affect</span>
4325 <span class="cm">                                       the symbol table */</span>
4326     <span class="kt">int</span> <span class="n">recursion_depth</span><span class="p">;</span>            <span class="cm">/* current recursion depth */</span>
4327     <span class="kt">int</span> <span class="n">recursion_limit</span><span class="p">;</span>            <span class="cm">/* recursion limit */</span>
4328 <span class="p">};</span>
4329 </pre></div>
4330 
4331 <p>Some of the symbol table API is exposed via <a href="https://docs.python.org/3/library/symtable.html">the <code>symtable</code> module</a> in the standard library. You can provide an expression or a module an receive a <code>symtable.SymbolTable</code> instance.</p>
4332 <p>You can provide a string with a Python expression and the <code>compile_type</code> of <code>"eval"</code>, or a module, function or class, and the <code>compile_mode</code> of <code>"exec"</code> to get a symbol table.</p>
4333 <p>Looping over the elements in the table we can see some of the public and private fields and their types:</p>
4334 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">symtable</span>
4335 <span class="gp">&gt;&gt;&gt; </span><span class="n">s</span> <span class="o">=</span> <span class="n">symtable</span><span class="o">.</span><span class="n">symtable</span><span class="p">(</span><span class="s1">&#39;b + 1&#39;</span><span class="p">,</span> <span class="n">filename</span><span class="o">=</span><span class="s1">&#39;test.py&#39;</span><span class="p">,</span> <span class="n">compile_type</span><span class="o">=</span><span class="s1">&#39;eval&#39;</span><span class="p">)</span>
4336 <span class="gp">&gt;&gt;&gt; </span><span class="p">[</span><span class="n">symbol</span><span class="o">.</span><span class="vm">__dict__</span> <span class="k">for</span> <span class="n">symbol</span> <span class="ow">in</span> <span class="n">s</span><span class="o">.</span><span class="n">get_symbols</span><span class="p">()]</span>
4337 <span class="go">[{&#39;_Symbol__name&#39;: &#39;b&#39;, &#39;_Symbol__flags&#39;: 6160, &#39;_Symbol__scope&#39;: 3, &#39;_Symbol__namespaces&#39;: ()}]</span>
4338 </pre></div>
4339 
4340 <p>The C code behind this is all within <code>Python/symtable.c</code> and the primary interface is the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/symtable.c#L262"><code>PySymtable_BuildObject()</code></a> function.</p>
4341 <p>Similar to the top-level AST function we covered earlier, the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/symtable.c#L262"><code>PySymtable_BuildObject()</code></a> function switches between the <code>mod_ty</code> possible types (Module, Expression, Interactive, Suite, FunctionType), and visits each of the statements inside them.</p>
4342 <p>Remember, <code>mod_ty</code> is an AST instance, so the will now recursively explore the nodes and branches of the tree and add entries to the symtable:</p>
4343 <div class="highlight c"><pre><span></span><span class="k">struct</span> <span class="n">symtable</span> <span class="o">*</span>
4344 <span class="nf">PySymtable_BuildObject</span><span class="p">(</span><span class="n">mod_ty</span> <span class="n">mod</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">filename</span><span class="p">,</span> <span class="n">PyFutureFeatures</span> <span class="o">*</span><span class="n">future</span><span class="p">)</span>
4345 <span class="p">{</span>
4346 <span class="hll">    <span class="k">struct</span> <span class="n">symtable</span> <span class="o">*</span><span class="n">st</span> <span class="o">=</span> <span class="n">symtable_new</span><span class="p">();</span>
4347 </span>    <span class="n">asdl_seq</span> <span class="o">*</span><span class="n">seq</span><span class="p">;</span>
4348     <span class="kt">int</span> <span class="n">i</span><span class="p">;</span>
4349     <span class="n">PyThreadState</span> <span class="o">*</span><span class="n">tstate</span><span class="p">;</span>
4350     <span class="kt">int</span> <span class="n">recursion_limit</span> <span class="o">=</span> <span class="n">Py_GetRecursionLimit</span><span class="p">();</span>
4351 <span class="p">...</span>
4352     <span class="n">st</span><span class="o">-&gt;</span><span class="n">st_top</span> <span class="o">=</span> <span class="n">st</span><span class="o">-&gt;</span><span class="n">st_cur</span><span class="p">;</span>
4353     <span class="k">switch</span> <span class="p">(</span><span class="n">mod</span><span class="o">-&gt;</span><span class="n">kind</span><span class="p">)</span> <span class="p">{</span>
4354     <span class="k">case</span> <span class="nl">Module_kind</span><span class="p">:</span>
4355         <span class="n">seq</span> <span class="o">=</span> <span class="n">mod</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">Module</span><span class="p">.</span><span class="n">body</span><span class="p">;</span>
4356         <span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">asdl_seq_LEN</span><span class="p">(</span><span class="n">seq</span><span class="p">);</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
4357 <span class="hll">            <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">symtable_visit_stmt</span><span class="p">(</span><span class="n">st</span><span class="p">,</span>
4358 </span>                        <span class="p">(</span><span class="n">stmt_ty</span><span class="p">)</span><span class="n">asdl_seq_GET</span><span class="p">(</span><span class="n">seq</span><span class="p">,</span> <span class="n">i</span><span class="p">)))</span>
4359                 <span class="k">goto</span> <span class="n">error</span><span class="p">;</span>
4360         <span class="k">break</span><span class="p">;</span>
4361     <span class="k">case</span> <span class="nl">Expression_kind</span><span class="p">:</span>
4362         <span class="p">...</span>
4363     <span class="k">case</span> <span class="nl">Interactive_kind</span><span class="p">:</span>
4364         <span class="p">...</span>
4365     <span class="k">case</span> <span class="nl">Suite_kind</span><span class="p">:</span>
4366         <span class="p">...</span>
4367     <span class="k">case</span> <span class="nl">FunctionType_kind</span><span class="p">:</span>
4368         <span class="p">...</span>
4369     <span class="p">}</span>
4370     <span class="p">...</span>
4371 <span class="p">}</span>
4372 </pre></div>
4373 
4374 <p>So for a module, <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/symtable.c#L262"><code>PySymtable_BuildObject()</code></a> will loop through each statement in the module and call <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/symtable.c#L1176"><code>symtable_visit_stmt()</code></a>. The <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/symtable.c#L1176"><code>symtable_visit_stmt()</code></a> is a huge <code>switch</code> statement with a case for each statement type (defined in <code>Parser/Python.asdl</code>).</p>
4375 <p>For each statement type, there is specific logic to that statement type. For example, a function definition has particular logic for:</p>
4376 <ol>
4377 <li>If the recursion depth is beyond the limit, raise a recursion depth error</li>
4378 <li>The name of the function to be added as a local variable</li>
4379 <li>The default values for sequential arguments to be resolved</li>
4380 <li>The default values for keyword arguments to be resolved </li>
4381 <li>Any annotations for the arguments or the return type are resolved</li>
4382 <li>Any function decorators are resolved</li>
4383 <li>The code block with the contents of the function is visited in <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/symtable.c#L973"><code>symtable_enter_block()</code></a></li>
4384 <li>The arguments are visited</li>
4385 <li>The body of the function is visited</li>
4386 </ol>
4387 <div class="alert alert-primary" role="alert">
4388 <p><strong>Note:</strong> If you&rsquo;ve ever wondered why Python&rsquo;s default arguments are mutable, the reason is in this function. You can see they are a pointer to the variable in the symtable. No extra work is done to copy any values to an immutable type.</p>
4389 </div>
4390 <div class="highlight c"><pre><span></span><span class="k">static</span> <span class="kt">int</span>
4391 <span class="nf">symtable_visit_stmt</span><span class="p">(</span><span class="k">struct</span> <span class="n">symtable</span> <span class="o">*</span><span class="n">st</span><span class="p">,</span> <span class="n">stmt_ty</span> <span class="n">s</span><span class="p">)</span>
4392 <span class="p">{</span>
4393 <span class="hll">    <span class="k">if</span> <span class="p">(</span><span class="o">++</span><span class="n">st</span><span class="o">-&gt;</span><span class="n">recursion_depth</span> <span class="o">&gt;</span> <span class="n">st</span><span class="o">-&gt;</span><span class="n">recursion_limit</span><span class="p">)</span> <span class="p">{</span>                          <span class="c1">// 1.</span>
4394 </span>        <span class="n">PyErr_SetString</span><span class="p">(</span><span class="n">PyExc_RecursionError</span><span class="p">,</span>
4395                         <span class="s">&quot;maximum recursion depth exceeded during compilation&quot;</span><span class="p">);</span>
4396         <span class="n">VISIT_QUIT</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
4397     <span class="p">}</span>
4398     <span class="k">switch</span> <span class="p">(</span><span class="n">s</span><span class="o">-&gt;</span><span class="n">kind</span><span class="p">)</span> <span class="p">{</span>
4399     <span class="k">case</span> <span class="nl">FunctionDef_kind</span><span class="p">:</span>
4400 <span class="hll">        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">symtable_add_def</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">s</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="n">DEF_LOCAL</span><span class="p">))</span>            <span class="c1">// 2.</span>
4401 </span>            <span class="n">VISIT_QUIT</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
4402 <span class="hll">        <span class="k">if</span> <span class="p">(</span><span class="n">s</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">args</span><span class="o">-&gt;</span><span class="n">defaults</span><span class="p">)</span>                                    <span class="c1">// 3.</span>
4403 </span>            <span class="n">VISIT_SEQ</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">expr</span><span class="p">,</span> <span class="n">s</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">args</span><span class="o">-&gt;</span><span class="n">defaults</span><span class="p">);</span>
4404 <span class="hll">        <span class="k">if</span> <span class="p">(</span><span class="n">s</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">args</span><span class="o">-&gt;</span><span class="n">kw_defaults</span><span class="p">)</span>                                 <span class="c1">// 4.</span>
4405 </span>            <span class="n">VISIT_SEQ_WITH_NULL</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">expr</span><span class="p">,</span> <span class="n">s</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">args</span><span class="o">-&gt;</span><span class="n">kw_defaults</span><span class="p">);</span>
4406 <span class="hll">        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">symtable_visit_annotations</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">s</span><span class="p">,</span> <span class="n">s</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">args</span><span class="p">,</span>           <span class="c1">// 5.</span>
4407 </span>                                        <span class="n">s</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">returns</span><span class="p">))</span>
4408             <span class="n">VISIT_QUIT</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
4409 <span class="hll">        <span class="k">if</span> <span class="p">(</span><span class="n">s</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">decorator_list</span><span class="p">)</span>                                    <span class="c1">// 6.</span>
4410 </span>            <span class="n">VISIT_SEQ</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">expr</span><span class="p">,</span> <span class="n">s</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">decorator_list</span><span class="p">);</span>
4411 <span class="hll">        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">symtable_enter_block</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">s</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">name</span><span class="p">,</span>                    <span class="c1">// 7.</span>
4412 </span>                                  <span class="n">FunctionBlock</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">s</span><span class="p">,</span> <span class="n">s</span><span class="o">-&gt;</span><span class="n">lineno</span><span class="p">,</span>
4413                                   <span class="n">s</span><span class="o">-&gt;</span><span class="n">col_offset</span><span class="p">))</span>
4414             <span class="n">VISIT_QUIT</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
4415 <span class="hll">        <span class="n">VISIT</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">arguments</span><span class="p">,</span> <span class="n">s</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">args</span><span class="p">);</span>                            <span class="c1">// 8.</span>
4416 </span><span class="hll">        <span class="n">VISIT_SEQ</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">stmt</span><span class="p">,</span> <span class="n">s</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">FunctionDef</span><span class="p">.</span><span class="n">body</span><span class="p">);</span>                             <span class="c1">// 9.</span>
4417 </span>        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">symtable_exit_block</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="n">s</span><span class="p">))</span>
4418             <span class="n">VISIT_QUIT</span><span class="p">(</span><span class="n">st</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
4419         <span class="k">break</span><span class="p">;</span>
4420     <span class="k">case</span> <span class="nl">ClassDef_kind</span><span class="p">:</span> <span class="p">{</span>
4421         <span class="p">...</span>
4422     <span class="p">}</span>
4423     <span class="k">case</span> <span class="nl">Return_kind</span><span class="p">:</span>
4424         <span class="p">...</span>
4425     <span class="k">case</span> <span class="nl">Delete_kind</span><span class="p">:</span>
4426         <span class="p">...</span>
4427     <span class="k">case</span> <span class="nl">Assign_kind</span><span class="p">:</span>
4428         <span class="p">...</span>
4429     <span class="k">case</span> <span class="nl">AnnAssign_kind</span><span class="p">:</span>
4430         <span class="p">...</span>
4431 </pre></div>
4432 
4433 <p>Once the resulting symtable has been created, it is sent back to be used for the compiler.</p>
4434 <h4 id="core-compilation-process">Core Compilation Process</h4>
4435 <p>Now that the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L312"><code>PyAST_CompileObject()</code></a> has a compiler state, a symtable, and a module in the form of the AST, the actual compilation can begin.</p>
4436 <p>The purpose of the core compiler is to:</p>
4437 <ul>
4438 <li>Convert the state, symtable, and AST into a <a href="https://en.wikipedia.org/wiki/Control-flow_graph">Control-Flow-Graph (CFG)</a></li>
4439 <li>Protect the execution stage from runtime exceptions by catching any logic and code errors and raising them here</li>
4440 </ul>
4441 <p>You can call the CPython compiler in Python code by calling the built-in function <code>compile()</code>. It returns a <code>code object</code> instance:</p>
4442 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="nb">compile</span><span class="p">(</span><span class="s1">&#39;b+1&#39;</span><span class="p">,</span> <span class="s1">&#39;test.py&#39;</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s1">&#39;eval&#39;</span><span class="p">)</span>
4443 <span class="go">&lt;code object &lt;module&gt; at 0x10f222780, file &quot;test.py&quot;, line 1&gt;</span>
4444 </pre></div>
4445 
4446 <p>The same as with the <code>symtable()</code> function, a simple expression should have a mode of <code>'eval'</code> and a module, function, or class should have a mode of <code>'exec'</code>.</p>
4447 <p>The compiled code can be found in the <code>co_code</code> property of the code object:</p>
4448 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">co</span><span class="o">.</span><span class="n">co_code</span>
4449 <span class="go">b&#39;e\x00d\x00\x17\x00S\x00&#39;</span>
4450 </pre></div>
4451 
4452 <p>There is also a <code>dis</code> module in the standard library, which disassembles the bytecode instructions and can print them on the screen or give you a list of <code>Instruction</code> instances.</p>
4453 <p>If you import <code>dis</code> and give the <code>dis()</code> function the code object&rsquo;s <code>co_code</code> property it disassembles it and prints the instructions on the REPL:</p>
4454 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">dis</span>
4455 <span class="gp">&gt;&gt;&gt; </span><span class="n">dis</span><span class="o">.</span><span class="n">dis</span><span class="p">(</span><span class="n">co</span><span class="o">.</span><span class="n">co_code</span><span class="p">)</span>
4456 <span class="go">          0 LOAD_NAME                0 (0)</span>
4457 <span class="go">          2 LOAD_CONST               0 (0)</span>
4458 <span class="go">          4 BINARY_ADD</span>
4459 <span class="go">          6 RETURN_VALUE</span>
4460 </pre></div>
4461 
4462 <p><code>LOAD_NAME</code>, <code>LOAD_CONST</code>, <code>BINARY_ADD</code>, and <code>RETURN_VALUE</code> are all bytecode instructions. They&rsquo;re called bytecode because, in binary form, they were a byte long. However, since Python 3.6 the storage format was changed to a <code>word</code>, so now they&rsquo;re technically wordcode, not bytecode.</p>
4463 <p>The <a href="https://docs.python.org/3/library/dis.html#python-bytecode-instructions">full list of bytecode instructions</a> is available for each version of Python, and it does change between versions. For example, in Python 3.7, some new bytecode instructions were introduced to speed up execution of specific method calls.</p>
4464 <p>In an earlier section, we explored the <code>instaviz</code> package. This included a visualization of the code object type by running the compiler. It also displays the Bytecode operations inside the code objects.</p>
4465 <p>Execute instaviz again to see the code object and bytecode for a function defined on the REPL:</p>
4466 <div class="highlight python repl"><span class="repl-toggle" title="Toggle REPL prompts and output">&gt;&gt;&gt;</span><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">instaviz</span>
4467 <span class="gp">&gt;&gt;&gt; </span><span class="k">def</span> <span class="nf">example</span><span class="p">():</span>
4468 <span class="go">       a = 1</span>
4469 <span class="go">       b = a + 1</span>
4470 <span class="go">       return b</span>
4471 <span class="gp">&gt;&gt;&gt; </span><span class="n">instaviz</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">example</span><span class="p">)</span>
4472 </pre></div>
4473 
4474 <p>If we now jump into <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L1782"><code>compiler_mod()</code></a>, a function used to switch to different compiler functions depending on the module type. We&rsquo;ll assume that <code>mod</code> is a <code>Module</code>. The module is compiled into the compiler state and then <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L5971"><code>assemble()</code></a> is run to create a <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/code.h#L69"><code>PyCodeObject</code></a>.</p>
4475 <p>The new code object is returned back to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L312"><code>PyAST_CompileObject()</code></a> and sent on for execution:</p>
4476 <div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyCodeObject</span> <span class="o">*</span>
4477 <span class="nf">compiler_mod</span><span class="p">(</span><span class="k">struct</span> <span class="n">compiler</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="n">mod_ty</span> <span class="n">mod</span><span class="p">)</span>
4478 <span class="p">{</span>
4479 <span class="hll">    <span class="n">PyCodeObject</span> <span class="o">*</span><span class="n">co</span><span class="p">;</span>
4480 </span>    <span class="kt">int</span> <span class="n">addNone</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
4481     <span class="k">static</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">module</span><span class="p">;</span>
4482     <span class="p">...</span>
4483     <span class="k">switch</span> <span class="p">(</span><span class="n">mod</span><span class="o">-&gt;</span><span class="n">kind</span><span class="p">)</span> <span class="p">{</span>
4484     <span class="k">case</span> <span class="nl">Module_kind</span><span class="p">:</span>
4485 <span class="hll">        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">compiler_body</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">mod</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">Module</span><span class="p">.</span><span class="n">body</span><span class="p">))</span> <span class="p">{</span>
4486 </span>            <span class="n">compiler_exit_scope</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
4487             <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
4488         <span class="p">}</span>
4489         <span class="k">break</span><span class="p">;</span>
4490     <span class="k">case</span> <span class="nl">Interactive_kind</span><span class="p">:</span>
4491         <span class="p">...</span>
4492     <span class="k">case</span> <span class="nl">Expression_kind</span><span class="p">:</span>
4493         <span class="p">...</span>
4494     <span class="k">case</span> <span class="nl">Suite_kind</span><span class="p">:</span>
4495         <span class="p">...</span>
4496     <span class="p">...</span>
4497 <span class="hll">    <span class="n">co</span> <span class="o">=</span> <span class="n">assemble</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">addNone</span><span class="p">);</span>
4498 </span>    <span class="n">compiler_exit_scope</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
4499 <span class="hll">    <span class="k">return</span> <span class="n">co</span><span class="p">;</span>
4500 </span><span class="p">}</span>
4501 </pre></div>
4502 
4503 <p>The <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L1743"><code>compiler_body()</code></a> function has some optimization flags and then loops over each statement in the module and visits it, similar to how the <code>symtable</code> functions worked:</p>
4504 <div class="highlight c"><pre><span></span><span class="k">static</span> <span class="kt">int</span>
4505 <span class="nf">compiler_body</span><span class="p">(</span><span class="k">struct</span> <span class="n">compiler</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="n">asdl_seq</span> <span class="o">*</span><span class="n">stmts</span><span class="p">)</span>
4506 <span class="p">{</span>
4507     <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
4508     <span class="n">stmt_ty</span> <span class="n">st</span><span class="p">;</span>
4509     <span class="n">PyObject</span> <span class="o">*</span><span class="n">docstring</span><span class="p">;</span>
4510     <span class="p">...</span>
4511 <span class="hll">    <span class="k">for</span> <span class="p">(;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">asdl_seq_LEN</span><span class="p">(</span><span class="n">stmts</span><span class="p">);</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
4512 </span><span class="hll">        <span class="n">VISIT</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">stmt</span><span class="p">,</span> <span class="p">(</span><span class="n">stmt_ty</span><span class="p">)</span><span class="n">asdl_seq_GET</span><span class="p">(</span><span class="n">stmts</span><span class="p">,</span> <span class="n">i</span><span class="p">));</span>
4513 </span>    <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
4514 <span class="p">}</span>
4515 </pre></div>
4516 
4517 <p>The statement type is determined through a call to the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/asdl.h#L32"><code>asdl_seq_GET()</code></a> function, which looks at the AST node&rsquo;s type.</p>
4518 <p>Through some smart macros, <code>VISIT</code> calls a function in <code>Python/compile.c</code> for each statement type:</p>
4519 <div class="highlight c"><pre><span></span><span class="cp">#define VISIT(C, TYPE, V) {\</span>
4520 <span class="cp">    if (!compiler_visit_ ## TYPE((C), (V))) \</span>
4521 <span class="cp">        return 0; \</span>
4522 <span class="cp">}</span>
4523 </pre></div>
4524 
4525 <p>For a <code>stmt</code> (the category for a statement) the compiler will then drop into <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L3310"><code>compiler_visit_stmt()</code></a> and switch through all of the potential statement types found in <code>Parser/Python.asdl</code>:</p>
4526 <div class="highlight c"><pre><span></span><span class="k">static</span> <span class="kt">int</span>
4527 <span class="nf">compiler_visit_stmt</span><span class="p">(</span><span class="k">struct</span> <span class="n">compiler</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="n">stmt_ty</span> <span class="n">s</span><span class="p">)</span>
4528 <span class="p">{</span>
4529     <span class="n">Py_ssize_t</span> <span class="n">i</span><span class="p">,</span> <span class="n">n</span><span class="p">;</span>
4530 
4531     <span class="cm">/* Always assign a lineno to the next instruction for a stmt. */</span>
4532     <span class="n">c</span><span class="o">-&gt;</span><span class="n">u</span><span class="o">-&gt;</span><span class="n">u_lineno</span> <span class="o">=</span> <span class="n">s</span><span class="o">-&gt;</span><span class="n">lineno</span><span class="p">;</span>
4533     <span class="n">c</span><span class="o">-&gt;</span><span class="n">u</span><span class="o">-&gt;</span><span class="n">u_col_offset</span> <span class="o">=</span> <span class="n">s</span><span class="o">-&gt;</span><span class="n">col_offset</span><span class="p">;</span>
4534     <span class="n">c</span><span class="o">-&gt;</span><span class="n">u</span><span class="o">-&gt;</span><span class="n">u_lineno_set</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
4535 
4536     <span class="k">switch</span> <span class="p">(</span><span class="n">s</span><span class="o">-&gt;</span><span class="n">kind</span><span class="p">)</span> <span class="p">{</span>
4537     <span class="k">case</span> <span class="nl">FunctionDef_kind</span><span class="p">:</span>
4538         <span class="k">return</span> <span class="n">compiler_function</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">s</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
4539     <span class="k">case</span> <span class="nl">ClassDef_kind</span><span class="p">:</span>
4540         <span class="k">return</span> <span class="n">compiler_class</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">s</span><span class="p">);</span>
4541     <span class="p">...</span>
4542     <span class="k">case</span> <span class="nl">For_kind</span><span class="p">:</span>
4543         <span class="k">return</span> <span class="n">compiler_for</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">s</span><span class="p">);</span>
4544     <span class="p">...</span>
4545     <span class="p">}</span>
4546 
4547     <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
4548 <span class="p">}</span>
4549 </pre></div>
4550 
4551 <p>As an example, let&rsquo;s focus on the <code>For</code> statement, in Python is the:</p>
4552 <div class="highlight python"><pre><span></span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">iterable</span><span class="p">:</span>
4553     <span class="c1"># block</span>
4554 <span class="k">else</span><span class="p">:</span>  <span class="c1"># optional if iterable is False</span>
4555     <span class="c1"># block</span>
4556 </pre></div>
4557 
4558 <p>If the statement is a <code>For</code> type, it calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L2651"><code>compiler_for()</code></a>. There is an equivalent <code>compiler_*()</code> function for all of the statement and expression types. The more straightforward types create the bytecode instructions inline, some of the more complex statement types call other functions.</p>
4559 <p>Many of the statements can have sub-statements. A <code>for</code> loop has a body, but you can also have complex expressions in the assignment and the iterator.</p>
4560 <p>The compiler&rsquo;s <code>compiler_</code> statements sends blocks to the compiler state. These blocks contain instructions, the instruction data structure in <code>Python/compile.c</code> has the opcode, any arguments, and the target block (if this is a jump instruction), it also contains the line number.</p>
4561 <p>For jump statements, they can either be absolute or relative jump statements. Jump statements are used to &ldquo;jump&rdquo; from one operation to another. Absolute jump statements specify the exact operation number in the compiled code object, whereas relative jump statements specify the jump target relative to another operation:</p>
4562 <div class="highlight c"><pre><span></span><span class="k">struct</span> <span class="n">instr</span> <span class="p">{</span>
4563     <span class="kt">unsigned</span> <span class="nl">i_jabs</span> <span class="p">:</span> <span class="mi">1</span><span class="p">;</span>
4564     <span class="kt">unsigned</span> <span class="nl">i_jrel</span> <span class="p">:</span> <span class="mi">1</span><span class="p">;</span>
4565     <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">i_opcode</span><span class="p">;</span>
4566     <span class="kt">int</span> <span class="n">i_oparg</span><span class="p">;</span>
4567     <span class="k">struct</span> <span class="n">basicblock_</span> <span class="o">*</span><span class="n">i_target</span><span class="p">;</span> <span class="cm">/* target block (if jump instruction) */</span>
4568     <span class="kt">int</span> <span class="n">i_lineno</span><span class="p">;</span>
4569 <span class="p">};</span>
4570 </pre></div>
4571 
4572 <p>So a frame block (of type <code>basicblock</code>), contains the following fields:</p>
4573 <ul>
4574 <li>A <code>b_list</code> pointer, the link to a list of blocks for the compiler state</li>
4575 <li>A list of instructions <code>b_instr</code>, with both the allocated list size <code>b_ialloc</code>, and the number used <code>b_iused</code></li>
4576 <li>The next block after this one <code>b_next</code></li>
4577 <li>Whether the block has been &ldquo;seen&rdquo; by the assembler when traversing depth-first</li>
4578 <li>If this block has a <code>RETURN_VALUE</code> opcode (<code>b_return</code>)</li>
4579 <li>The depth of the stack when this block was entered (<code>b_startdepth</code>)</li>
4580 <li>The instruction offset for the assembler</li>
4581 </ul>
4582 <div class="highlight c"><pre><span></span><span class="k">typedef</span> <span class="k">struct</span> <span class="n">basicblock_</span> <span class="p">{</span>
4583     <span class="cm">/* Each basicblock in a compilation unit is linked via b_list in the</span>
4584 <span class="cm">       reverse order that the block are allocated.  b_list points to the next</span>
4585 <span class="cm">       block, not to be confused with b_next, which is next by control flow. */</span>
4586     <span class="k">struct</span> <span class="n">basicblock_</span> <span class="o">*</span><span class="n">b_list</span><span class="p">;</span>
4587     <span class="cm">/* number of instructions used */</span>
4588     <span class="kt">int</span> <span class="n">b_iused</span><span class="p">;</span>
4589     <span class="cm">/* length of instruction array (b_instr) */</span>
4590     <span class="kt">int</span> <span class="n">b_ialloc</span><span class="p">;</span>
4591     <span class="cm">/* pointer to an array of instructions, initially NULL */</span>
4592     <span class="k">struct</span> <span class="n">instr</span> <span class="o">*</span><span class="n">b_instr</span><span class="p">;</span>
4593     <span class="cm">/* If b_next is non-NULL, it is a pointer to the next</span>
4594 <span class="cm">       block reached by normal control flow. */</span>
4595     <span class="k">struct</span> <span class="n">basicblock_</span> <span class="o">*</span><span class="n">b_next</span><span class="p">;</span>
4596     <span class="cm">/* b_seen is used to perform a DFS of basicblocks. */</span>
4597     <span class="kt">unsigned</span> <span class="nl">b_seen</span> <span class="p">:</span> <span class="mi">1</span><span class="p">;</span>
4598     <span class="cm">/* b_return is true if a RETURN_VALUE opcode is inserted. */</span>
4599     <span class="kt">unsigned</span> <span class="nl">b_return</span> <span class="p">:</span> <span class="mi">1</span><span class="p">;</span>
4600     <span class="cm">/* depth of stack upon entry of block, computed by stackdepth() */</span>
4601     <span class="kt">int</span> <span class="n">b_startdepth</span><span class="p">;</span>
4602     <span class="cm">/* instruction offset for block, computed by assemble_jump_offsets() */</span>
4603     <span class="kt">int</span> <span class="n">b_offset</span><span class="p">;</span>
4604 <span class="p">}</span> <span class="n">basicblock</span><span class="p">;</span>
4605 </pre></div>
4606 
4607 <p>The <code>For</code> statement is somewhere in the middle in terms of complexity. There are 15 steps in the compilation of a <code>For</code> statement with the <code>for &lt;target&gt; in &lt;iterator&gt;:</code> syntax:</p>
4608 <ol>
4609 <li>Create a new code block called <code>start</code>, this allocates memory and creates a <code>basicblock</code> pointer</li>
4610 <li>Create a new code block called <code>cleanup</code></li>
4611 <li>Create a new code block called <code>end</code></li>
4612 <li>Push a frame block of type <code>FOR_LOOP</code> to the stack with <code>start</code> as the entry block and <code>end</code> as the exit block</li>
4613 <li>Visit the iterator expression, which adds any operations for the iterator</li>
4614 <li>Add the <code>GET_ITER</code> operation to the compiler state</li>
4615 <li>Switch to the <code>start</code> block</li>
4616 <li>Call <code>ADDOP_JREL</code> which calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L1413"><code>compiler_addop_j()</code></a> to add the <code>FOR_ITER</code> operation with an argument of the <code>cleanup</code> block</li>
4617 <li>Visit the <code>target</code> and add any special code, like tuple unpacking, to the <code>start</code> block</li>
4618 <li>Visit each statement in the body of the for loop</li>
4619 <li>Call <code>ADDOP_JABS</code> which calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L1413"><code>compiler_addop_j()</code></a> to add the <code>JUMP_ABSOLUTE</code> operation which indicates after the body is executed, jumps back to the start of the loop</li>
4620 <li>Move to the <code>cleanup</code> block</li>
4621 <li>Pop the <code>FOR_LOOP</code> frame block off the stack</li>
4622 <li>Visit the statements inside the <code>else</code> section of the for loop</li>
4623 <li>Use the <code>end</code> block</li>
4624 </ol>
4625 <p>Referring back to the <code>basicblock</code> structure. You can see how in the compilation of the for statement, the various blocks are created and pushed into the compiler&rsquo;s frame block and stack:</p>
4626 <div class="highlight c"><pre><span></span><span class="k">static</span> <span class="kt">int</span>
4627 <span class="nf">compiler_for</span><span class="p">(</span><span class="k">struct</span> <span class="n">compiler</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="n">stmt_ty</span> <span class="n">s</span><span class="p">)</span>
4628 <span class="p">{</span>
4629     <span class="n">basicblock</span> <span class="o">*</span><span class="n">start</span><span class="p">,</span> <span class="o">*</span><span class="n">cleanup</span><span class="p">,</span> <span class="o">*</span><span class="n">end</span><span class="p">;</span>
4630 
4631 <span class="hll">    <span class="n">start</span> <span class="o">=</span> <span class="n">compiler_new_block</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>                       <span class="c1">// 1.</span>
4632 </span><span class="hll">    <span class="n">cleanup</span> <span class="o">=</span> <span class="n">compiler_new_block</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>                     <span class="c1">// 2.</span>
4633 </span><span class="hll">    <span class="n">end</span> <span class="o">=</span> <span class="n">compiler_new_block</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>                         <span class="c1">// 3.</span>
4634 </span>    <span class="k">if</span> <span class="p">(</span><span class="n">start</span> <span class="o">==</span> <span class="nb">NULL</span> <span class="o">||</span> <span class="n">end</span> <span class="o">==</span> <span class="nb">NULL</span> <span class="o">||</span> <span class="n">cleanup</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
4635         <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
4636 
4637 <span class="hll">    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">compiler_push_fblock</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">FOR_LOOP</span><span class="p">,</span> <span class="n">start</span><span class="p">,</span> <span class="n">end</span><span class="p">))</span>  <span class="c1">// 4.</span>
4638 </span>        <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
4639 
4640 <span class="hll">    <span class="n">VISIT</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">expr</span><span class="p">,</span> <span class="n">s</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">For</span><span class="p">.</span><span class="n">iter</span><span class="p">);</span>                       <span class="c1">// 5.</span>
4641 </span><span class="hll">    <span class="n">ADDOP</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">GET_ITER</span><span class="p">);</span>                                  <span class="c1">// 6.</span>
4642 </span><span class="hll">    <span class="n">compiler_use_next_block</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">start</span><span class="p">);</span>                   <span class="c1">// 7.</span>
4643 </span><span class="hll">    <span class="n">ADDOP_JREL</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">FOR_ITER</span><span class="p">,</span> <span class="n">cleanup</span><span class="p">);</span>                    <span class="c1">// 8.</span>
4644 </span><span class="hll">    <span class="n">VISIT</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">expr</span><span class="p">,</span> <span class="n">s</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">For</span><span class="p">.</span><span class="n">target</span><span class="p">);</span>                     <span class="c1">// 9.</span>
4645 </span><span class="hll">    <span class="n">VISIT_SEQ</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">stmt</span><span class="p">,</span> <span class="n">s</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">For</span><span class="p">.</span><span class="n">body</span><span class="p">);</span>                   <span class="c1">// 10.</span>
4646 </span><span class="hll">    <span class="n">ADDOP_JABS</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">JUMP_ABSOLUTE</span><span class="p">,</span> <span class="n">start</span><span class="p">);</span>                 <span class="c1">// 11.</span>
4647 </span><span class="hll">    <span class="n">compiler_use_next_block</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">cleanup</span><span class="p">);</span>                 <span class="c1">// 12.</span>
4648 </span>
4649 <span class="hll">    <span class="n">compiler_pop_fblock</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">FOR_LOOP</span><span class="p">,</span> <span class="n">start</span><span class="p">);</span>             <span class="c1">// 13.</span>
4650 </span>
4651 <span class="hll">    <span class="n">VISIT_SEQ</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">stmt</span><span class="p">,</span> <span class="n">s</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">.</span><span class="n">For</span><span class="p">.</span><span class="n">orelse</span><span class="p">);</span>                 <span class="c1">// 14.</span>
4652 </span><span class="hll">    <span class="n">compiler_use_next_block</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">end</span><span class="p">);</span>                     <span class="c1">// 15.</span>
4653 </span>    <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
4654 <span class="p">}</span>
4655 </pre></div>
4656 
4657 <p>Depending on the type of operation, there are different arguments required. For example, we used <code>ADDOP_JABS</code> and <code>ADDOP_JREL</code> here, which refer to &ldquo;<strong>ADD</strong> <strong>O</strong>peration with <strong>J</strong>ump to a <strong>REL</strong>ative position&rdquo; and &ldquo;<strong>ADD</strong> <strong>O</strong>peration with <strong>J</strong>ump to an <strong>ABS</strong>olute position&rdquo;. This is referring to the <code>APPOP_JREL</code> and <code>ADDOP_JABS</code> macros which call <code>compiler_addop_j(struct compiler *c, int opcode, basicblock *b, int absolute)</code> and set the <code>absolute</code> argument to 0 and 1 respectively.</p>
4658 <p>There are some other macros, like <code>ADDOP_I</code> calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L1383"><code>compiler_addop_i()</code></a> which add an operation with an integer argument, or <code>ADDOP_O</code> calls <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L1345"><code>compiler_addop_o()</code></a> which adds an operation with a <code>PyObject</code> argument. </p>
4659 <p>Once these stages have completed, the compiler has a list of frame blocks, each containing a list of instructions and a pointer to the next block.</p>
4660 <h4 id="assembly">Assembly</h4>
4661 <p>With the compiler state, the assembler performs a &ldquo;depth-first-search&rdquo; of the blocks and merge the instructions into a single bytecode sequence. The assembler state is declared in <code>Python/compile.c</code>:</p>
4662 <div class="highlight c"><pre><span></span><span class="k">struct</span> <span class="n">assembler</span> <span class="p">{</span>
4663     <span class="n">PyObject</span> <span class="o">*</span><span class="n">a_bytecode</span><span class="p">;</span>  <span class="cm">/* string containing bytecode */</span>
4664     <span class="kt">int</span> <span class="n">a_offset</span><span class="p">;</span>              <span class="cm">/* offset into bytecode */</span>
4665     <span class="kt">int</span> <span class="n">a_nblocks</span><span class="p">;</span>             <span class="cm">/* number of reachable blocks */</span>
4666     <span class="n">basicblock</span> <span class="o">**</span><span class="n">a_postorder</span><span class="p">;</span> <span class="cm">/* list of blocks in dfs postorder */</span>
4667     <span class="n">PyObject</span> <span class="o">*</span><span class="n">a_lnotab</span><span class="p">;</span>    <span class="cm">/* string containing lnotab */</span>
4668     <span class="kt">int</span> <span class="n">a_lnotab_off</span><span class="p">;</span>      <span class="cm">/* offset into lnotab */</span>
4669     <span class="kt">int</span> <span class="n">a_lineno</span><span class="p">;</span>              <span class="cm">/* last lineno of emitted instruction */</span>
4670     <span class="kt">int</span> <span class="n">a_lineno_off</span><span class="p">;</span>      <span class="cm">/* bytecode offset of last lineno */</span>
4671 <span class="p">};</span>
4672 </pre></div>
4673 
4674 <p>The <code>assemble()</code> function has a few tasks:</p>
4675 <ul>
4676 <li>Calculate the number of blocks for memory allocation</li>
4677 <li>Ensure that every block that falls off the end returns <code>None</code>, this is why every function returns <code>None</code>, whether or not a <code>return</code> statement exists</li>
4678 <li>Resolve any jump statements offsets that were marked as relative</li>
4679 <li>Call <code>dfs()</code> to perform a depth-first-search of the blocks</li>
4680 <li>Emit all the instructions to the compiler</li>
4681 <li>Call <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L5854"><code>makecode()</code></a> with the compiler state to generate the <code>PyCodeObject</code></li>
4682 </ul>
4683 <div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyCodeObject</span> <span class="o">*</span>
4684 <span class="nf">assemble</span><span class="p">(</span><span class="k">struct</span> <span class="n">compiler</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="kt">int</span> <span class="n">addNone</span><span class="p">)</span>
4685 <span class="p">{</span>
4686     <span class="n">basicblock</span> <span class="o">*</span><span class="n">b</span><span class="p">,</span> <span class="o">*</span><span class="n">entryblock</span><span class="p">;</span>
4687     <span class="k">struct</span> <span class="n">assembler</span> <span class="n">a</span><span class="p">;</span>
4688     <span class="kt">int</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="n">nblocks</span><span class="p">;</span>
4689     <span class="n">PyCodeObject</span> <span class="o">*</span><span class="n">co</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
4690 
4691     <span class="cm">/* Make sure every block that falls off the end returns None.</span>
4692 <span class="cm">       XXX NEXT_BLOCK() isn&#39;t quite right, because if the last</span>
4693 <span class="cm">       block ends with a jump or return b_next shouldn&#39;t set.</span>
4694 <span class="cm">     */</span>
4695     <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">c</span><span class="o">-&gt;</span><span class="n">u</span><span class="o">-&gt;</span><span class="n">u_curblock</span><span class="o">-&gt;</span><span class="n">b_return</span><span class="p">)</span> <span class="p">{</span>
4696         <span class="n">NEXT_BLOCK</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
4697 <span class="hll">        <span class="k">if</span> <span class="p">(</span><span class="n">addNone</span><span class="p">)</span>
4698 </span><span class="hll">            <span class="n">ADDOP_LOAD_CONST</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">Py_None</span><span class="p">);</span>
4699 </span><span class="hll">        <span class="n">ADDOP</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">RETURN_VALUE</span><span class="p">);</span>
4700 </span>    <span class="p">}</span>
4701     <span class="p">...</span>
4702 <span class="hll">    <span class="n">dfs</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">entryblock</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">a</span><span class="p">,</span> <span class="n">nblocks</span><span class="p">);</span>
4703 </span>
4704     <span class="cm">/* Can&#39;t modify the bytecode after computing jump offsets. */</span>
4705 <span class="hll">    <span class="n">assemble_jump_offsets</span><span class="p">(</span><span class="o">&amp;</span><span class="n">a</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>
4706 </span>
4707     <span class="cm">/* Emit code in reverse postorder from dfs. */</span>
4708 <span class="hll">    <span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="n">a</span><span class="p">.</span><span class="n">a_nblocks</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span> <span class="n">i</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o">--</span><span class="p">)</span> <span class="p">{</span>
4709 </span><span class="hll">        <span class="n">b</span> <span class="o">=</span> <span class="n">a</span><span class="p">.</span><span class="n">a_postorder</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
4710 </span><span class="hll">        <span class="k">for</span> <span class="p">(</span><span class="n">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">j</span> <span class="o">&lt;</span> <span class="n">b</span><span class="o">-&gt;</span><span class="n">b_iused</span><span class="p">;</span> <span class="n">j</span><span class="o">++</span><span class="p">)</span>
4711 </span><span class="hll">            <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">assemble_emit</span><span class="p">(</span><span class="o">&amp;</span><span class="n">a</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">b</span><span class="o">-&gt;</span><span class="n">b_instr</span><span class="p">[</span><span class="n">j</span><span class="p">]))</span>
4712 </span><span class="hll">                <span class="k">goto</span> <span class="n">error</span><span class="p">;</span>
4713 </span><span class="hll">    <span class="p">}</span>
4714 </span>    <span class="p">...</span>
4715 
4716     <span class="n">co</span> <span class="o">=</span> <span class="n">makecode</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">a</span><span class="p">);</span>
4717 <span class="hll"> <span class="nl">error</span><span class="p">:</span>
4718 </span>    <span class="n">assemble_free</span><span class="p">(</span><span class="o">&amp;</span><span class="n">a</span><span class="p">);</span>
4719     <span class="k">return</span> <span class="n">co</span><span class="p">;</span>
4720 <span class="p">}</span>
4721 </pre></div>
4722 
4723 <p>The depth-first-search is performed by the <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L5397"><code>dfs()</code></a> function in <code>Python/compile.c</code>, which follows the the <code>b_next</code> pointers in each of the blocks, marks them as seen by toggling <code>b_seen</code> and then adds them to the assemblers <code>**a_postorder</code> list in reverse order.</p>
4724 <p>The function loops back over the assembler&rsquo;s post-order list and for each block, if it has a jump operation, recursively call <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L5397"><code>dfs()</code></a> for that jump:</p>
4725 <div class="highlight c"><pre><span></span><span class="k">static</span> <span class="kt">void</span>
4726 <span class="nf">dfs</span><span class="p">(</span><span class="k">struct</span> <span class="n">compiler</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="n">basicblock</span> <span class="o">*</span><span class="n">b</span><span class="p">,</span> <span class="k">struct</span> <span class="n">assembler</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="kt">int</span> <span class="n">end</span><span class="p">)</span>
4727 <span class="p">{</span>
4728     <span class="kt">int</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">;</span>
4729 
4730     <span class="cm">/* Get rid of recursion for normal control flow.</span>
4731 <span class="cm">       Since the number of blocks is limited, unused space in a_postorder</span>
4732 <span class="cm">       (from a_nblocks to end) can be used as a stack for still not ordered</span>
4733 <span class="cm">       blocks. */</span>
4734     <span class="k">for</span> <span class="p">(</span><span class="n">j</span> <span class="o">=</span> <span class="n">end</span><span class="p">;</span> <span class="n">b</span> <span class="o">&amp;&amp;</span> <span class="o">!</span><span class="n">b</span><span class="o">-&gt;</span><span class="n">b_seen</span><span class="p">;</span> <span class="n">b</span> <span class="o">=</span> <span class="n">b</span><span class="o">-&gt;</span><span class="n">b_next</span><span class="p">)</span> <span class="p">{</span>
4735         <span class="n">b</span><span class="o">-&gt;</span><span class="n">b_seen</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
4736         <span class="n">assert</span><span class="p">(</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">a_nblocks</span> <span class="o">&lt;</span> <span class="n">j</span><span class="p">);</span>
4737         <span class="n">a</span><span class="o">-&gt;</span><span class="n">a_postorder</span><span class="p">[</span><span class="o">--</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">b</span><span class="p">;</span>
4738     <span class="p">}</span>
4739     <span class="k">while</span> <span class="p">(</span><span class="n">j</span> <span class="o">&lt;</span> <span class="n">end</span><span class="p">)</span> <span class="p">{</span>
4740         <span class="n">b</span> <span class="o">=</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">a_postorder</span><span class="p">[</span><span class="n">j</span><span class="o">++</span><span class="p">];</span>
4741         <span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">b</span><span class="o">-&gt;</span><span class="n">b_iused</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
4742             <span class="k">struct</span> <span class="n">instr</span> <span class="o">*</span><span class="n">instr</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">b</span><span class="o">-&gt;</span><span class="n">b_instr</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
4743             <span class="k">if</span> <span class="p">(</span><span class="n">instr</span><span class="o">-&gt;</span><span class="n">i_jrel</span> <span class="o">||</span> <span class="n">instr</span><span class="o">-&gt;</span><span class="n">i_jabs</span><span class="p">)</span>
4744                 <span class="n">dfs</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">instr</span><span class="o">-&gt;</span><span class="n">i_target</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">j</span><span class="p">);</span>
4745         <span class="p">}</span>
4746         <span class="n">assert</span><span class="p">(</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">a_nblocks</span> <span class="o">&lt;</span> <span class="n">j</span><span class="p">);</span>
4747         <span class="n">a</span><span class="o">-&gt;</span><span class="n">a_postorder</span><span class="p">[</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">a_nblocks</span><span class="o">++</span><span class="p">]</span> <span class="o">=</span> <span class="n">b</span><span class="p">;</span>
4748     <span class="p">}</span>
4749 <span class="p">}</span>
4750 </pre></div>
4751 
4752 <h4 id="creating-a-code-object">Creating a Code Object</h4>
4753 <p>The task of <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/compile.c#L5854"><code>makecode()</code></a> is to go through the compiler state, some of the assembler&rsquo;s properties and to put these into a <code>PyCodeObject</code> by calling <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/codeobject.c#L246"><code>PyCode_New()</code></a>:</p>
4754 <p><a href="https://files.realpython.com/media/codeobject.9c054576627c.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/codeobject.9c054576627c.png" width="201" height="550" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/codeobject.9c054576627c.png&amp;w=50&amp;sig=9d1c4ff65adb0d6d578b775ca93a88843a3742c1 50w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/codeobject.9c054576627c.png&amp;w=100&amp;sig=076765eedb49d9e4e944629435b5a0fc10942c4c 100w, https://files.realpython.com/media/codeobject.9c054576627c.png 201w" sizes="75vw" alt="PyCodeObject structure"/></a></p>
4755 <p>The variable names, constants are put as properties to the code object:</p>
4756 <div class="highlight c"><pre><span></span><span class="k">static</span> <span class="n">PyCodeObject</span> <span class="o">*</span>
4757 <span class="nf">makecode</span><span class="p">(</span><span class="k">struct</span> <span class="n">compiler</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="k">struct</span> <span class="n">assembler</span> <span class="o">*</span><span class="n">a</span><span class="p">)</span>
4758 <span class="p">{</span>
4759 <span class="p">...</span>
4760 
4761     <span class="n">consts</span> <span class="o">=</span> <span class="n">consts_dict_keys_inorder</span><span class="p">(</span><span class="n">c</span><span class="o">-&gt;</span><span class="n">u</span><span class="o">-&gt;</span><span class="n">u_consts</span><span class="p">);</span>
4762     <span class="n">names</span> <span class="o">=</span> <span class="n">dict_keys_inorder</span><span class="p">(</span><span class="n">c</span><span class="o">-&gt;</span><span class="n">u</span><span class="o">-&gt;</span><span class="n">u_names</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
4763     <span class="n">varnames</span> <span class="o">=</span> <span class="n">dict_keys_inorder</span><span class="p">(</span><span class="n">c</span><span class="o">-&gt;</span><span class="n">u</span><span class="o">-&gt;</span><span class="n">u_varnames</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
4764 <span class="p">...</span>
4765     <span class="n">cellvars</span> <span class="o">=</span> <span class="n">dict_keys_inorder</span><span class="p">(</span><span class="n">c</span><span class="o">-&gt;</span><span class="n">u</span><span class="o">-&gt;</span><span class="n">u_cellvars</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
4766 <span class="p">...</span>
4767     <span class="n">freevars</span> <span class="o">=</span> <span class="n">dict_keys_inorder</span><span class="p">(</span><span class="n">c</span><span class="o">-&gt;</span><span class="n">u</span><span class="o">-&gt;</span><span class="n">u_freevars</span><span class="p">,</span> <span class="n">PyTuple_GET_SIZE</span><span class="p">(</span><span class="n">cellvars</span><span class="p">));</span>
4768 <span class="p">...</span>
4769     <span class="n">flags</span> <span class="o">=</span> <span class="n">compute_code_flags</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
4770     <span class="k">if</span> <span class="p">(</span><span class="n">flags</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span>
4771         <span class="k">goto</span> <span class="n">error</span><span class="p">;</span>
4772 
4773     <span class="n">bytecode</span> <span class="o">=</span> <span class="n">PyCode_Optimize</span><span class="p">(</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">a_bytecode</span><span class="p">,</span> <span class="n">consts</span><span class="p">,</span> <span class="n">names</span><span class="p">,</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">a_lnotab</span><span class="p">);</span>
4774 <span class="p">...</span>
4775     <span class="n">co</span> <span class="o">=</span> <span class="n">PyCode_NewWithPosOnlyArgs</span><span class="p">(</span><span class="n">posonlyargcount</span><span class="o">+</span><span class="n">posorkeywordargcount</span><span class="p">,</span>
4776                                    <span class="n">posonlyargcount</span><span class="p">,</span> <span class="n">kwonlyargcount</span><span class="p">,</span> <span class="n">nlocals_int</span><span class="p">,</span> 
4777                                    <span class="n">maxdepth</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="n">bytecode</span><span class="p">,</span> <span class="n">consts</span><span class="p">,</span> <span class="n">names</span><span class="p">,</span>
4778                                    <span class="n">varnames</span><span class="p">,</span> <span class="n">freevars</span><span class="p">,</span> <span class="n">cellvars</span><span class="p">,</span> <span class="n">c</span><span class="o">-&gt;</span><span class="n">c_filename</span><span class="p">,</span>
4779                                    <span class="n">c</span><span class="o">-&gt;</span><span class="n">u</span><span class="o">-&gt;</span><span class="n">u_name</span><span class="p">,</span> <span class="n">c</span><span class="o">-&gt;</span><span class="n">u</span><span class="o">-&gt;</span><span class="n">u_firstlineno</span><span class="p">,</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">a_lnotab</span><span class="p">);</span>
4780 <span class="p">...</span>
4781     <span class="k">return</span> <span class="n">co</span><span class="p">;</span>
4782 <span class="p">}</span>
4783 </pre></div>
4784 
4785 <p>You may also notice that the bytecode is sent to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/peephole.c#L230"><code>PyCode_Optimize()</code></a> before it is sent to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Objects/codeobject.c#L106"><code>PyCode_NewWithPosOnlyArgs()</code></a>. This function is part of the bytecode optimization process in <code>Python/peephole.c</code>.</p>
4786 <p>The peephole optimizer goes through the bytecode instructions and in certain scenarios, replace them with other instructions. For example, there is an optimizer called &ldquo;constant unfolding&rdquo;, so if you put the following statement into your script:</p>
4787 <div class="highlight python"><pre><span></span><span class="n">a</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">+</span> <span class="mi">5</span>
4788 </pre></div>
4789 
4790 <p>It optimizes that to:</p>
4791 <div class="highlight python"><pre><span></span><span class="n">a</span> <span class="o">=</span> <span class="mi">6</span>
4792 </pre></div>
4793 
4794 <p>Because 1 and 5 are constant values, so the result should always be the same.</p>
4795 <h4 id="conclusion_2">Conclusion</h4>
4796 <p>We can pull together all of these stages with the instaviz module:</p>
4797 <div class="highlight python"><pre><span></span><span class="kn">import</span> <span class="nn">instaviz</span>
4798 
4799 <span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
4800     <span class="n">a</span> <span class="o">=</span> <span class="mi">2</span><span class="o">**</span><span class="mi">4</span>
4801     <span class="n">b</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">+</span> <span class="mi">5</span>
4802     <span class="n">c</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
4803     <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">c</span><span class="p">:</span>
4804         <span class="nb">print</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
4805     <span class="k">else</span><span class="p">:</span>
4806         <span class="nb">print</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
4807     <span class="k">return</span> <span class="n">c</span>
4808 
4809 
4810 <span class="n">instaviz</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">foo</span><span class="p">)</span>
4811 </pre></div>
4812 
4813 <p>Will produce an AST graph:</p>
4814 <p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.18.32_pm.4d9a0ea827ff.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.18.32_pm.4d9a0ea827ff.png" width="2788" height="1554" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-20_at_3.18.32_pm.4d9a0ea827ff.png&amp;w=697&amp;sig=9106a1bc23c5cc07f2ef159968c1ba2155a6562d 697w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-20_at_3.18.32_pm.4d9a0ea827ff.png&amp;w=1394&amp;sig=244bebb56c46d0b1eef97b8332afb6bfaa5e3648 1394w, https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.18.32_pm.4d9a0ea827ff.png 2788w" sizes="75vw" alt="Instaviz screenshot 6"/></a></p>
4815 <p>With bytecode instructions in sequence:</p>
4816 <p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.54_pm.6ea8ea532015.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.54_pm.6ea8ea532015.png" width="2536" height="1592" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.54_pm.6ea8ea532015.png&amp;w=634&amp;sig=f499889305c84679bdf07256f978975bbbc98c03 634w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.54_pm.6ea8ea532015.png&amp;w=1268&amp;sig=8d86595bce0f05c8dc45c3db3c0bff9b2d9cb0a9 1268w, https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.54_pm.6ea8ea532015.png 2536w" sizes="75vw" alt="Instaviz screenshot 7"/></a></p>
4817 <p>Also, the code object with the variable names, constants, and binary <code>co_code</code>:</p>
4818 <p><a href="https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.41_pm.231a0678f142.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.41_pm.231a0678f142.png" width="2098" height="940" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.41_pm.231a0678f142.png&amp;w=524&amp;sig=6daa3f3b9841eabbbf87d73886b81d346cdb33b3 524w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.41_pm.231a0678f142.png&amp;w=1049&amp;sig=7b074cc948021547f3da60e6bf0be3747585c6c8 1049w, https://files.realpython.com/media/Screen_Shot_2019-03-20_at_3.17.41_pm.231a0678f142.png 2098w" sizes="75vw" alt="Instaviz screenshot 8"/></a></p>
4819 <h3 id="execution">Execution</h3>
4820 <p>In <code>Python/pythonrun.c</code> we broke out just before the call to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1094"><code>run_eval_code_obj()</code></a>.</p>
4821 <p>This call takes a code object, either fetched from the marshaled <code>.pyc</code> file, or compiled through the AST and compiler stages.</p>
4822 <p><a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/pythonrun.c#L1094"><code>run_eval_code_obj()</code></a> will pass the globals, locals, <code>PyArena</code>, and compiled <code>PyCodeObject</code> to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L716"><code>PyEval_EvalCode()</code></a> in <code>Python/ceval.c</code>.</p>
4823 <p>This stage forms the execution component of CPython. Each of the bytecode operations is taken and executed using a <a href="http://www.cs.uwm.edu/classes/cs315/Bacon/Lecture/HTML/ch10s07.html">&ldquo;Stack Frame&rdquo; based system</a>.</p>
4824 <div class="alert alert-primary" role="alert">
4825 <p><strong>What is a Stack Frame?</strong></p>
4826 <p>Stack Frames are a data type used by many runtimes, not just Python, that allows functions to be called and variables to be returned between functions. Stack Frames also contain arguments, local variables, and other state information.</p>
4827 <p>Typically, a Stack Frame exists for every function call, and they are stacked in sequence. You can see CPython&rsquo;s frame stack anytime an exception is unhandled and the stack is printed on the screen.</p>
4828 </div>
4829 <p><a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L716"><code>PyEval_EvalCode()</code></a> is the public API for evaluating a code object. The logic for evaluation is split between <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L4045"><code>_PyEval_EvalCodeWithName()</code></a> and <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L745"><code>_PyEval_EvalFrameDefault()</code></a>, which are both in <code>ceval.c</code>.</p>
4830 <p>The public API <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L716"><code>PyEval_EvalCode()</code></a> will construct an execution frame from the top of the stack by calling <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L4045"><code>_PyEval_EvalCodeWithName()</code></a>.</p>
4831 <p>The construction of the first execution frame has many steps:</p>
4832 <ol>
4833 <li>Keyword and positional arguments are resolved.</li>
4834 <li>The use of <code>*args</code> and <code>**kwargs</code> in function definitions are resolved.</li>
4835 <li>Arguments are added as local variables to the scope.</li>
4836 <li>Co-routines and <a href="https://realpython.com/introduction-to-python-generators/">Generators</a> are created, including the Asynchronous Generators.</li>
4837 </ol>
4838 <p>The frame object looks like this:</p>
4839 <p><a href="https://files.realpython.com/media/PyFrameObject.8616eee0503e.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/PyFrameObject.8616eee0503e.png" width="161" height="408" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/PyFrameObject.8616eee0503e.png&amp;w=40&amp;sig=5c85bcc7939e61d207a19cf82d23cab3f73ec760 40w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/PyFrameObject.8616eee0503e.png&amp;w=80&amp;sig=25333bbe22791650facbac4288bb9f49065fb014 80w, https://files.realpython.com/media/PyFrameObject.8616eee0503e.png 161w" sizes="75vw" alt="PyFrameObject structure"/></a></p>
4840 <p>Let&rsquo;s step through those sequences.</p>
4841 <h4 id="1-constructing-thread-state">1. Constructing Thread State</h4>
4842 <p>Before a frame can be executed, it needs to be referenced from a thread. CPython can have many threads running at any one time within a single interpreter. An Interpreter state includes a list of those threads as a linked list. The thread structure is called <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Include/pystate.h#L23"><code>PyThreadState</code></a>, and there are many references throughout <code>ceval.c</code>.</p>
4843 <p>Here is the structure of the thread state object:</p>
4844 <p><a href="https://files.realpython.com/media/PyThreadState.20467f3689b7.png" target="_blank"><img class="img-fluid mx-auto d-block " src="https://files.realpython.com/media/PyThreadState.20467f3689b7.png" width="201" height="208" srcset="https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/PyThreadState.20467f3689b7.png&amp;w=50&amp;sig=90efd8d98ffa8ad9ed8b233c1e73fa469e4db4ac 50w, https://robocrop.realpython.net/?url=https%3A//files.realpython.com/media/PyThreadState.20467f3689b7.png&amp;w=100&amp;sig=e5d4a21dbc5cce9c1017a6d1805cf9eedf03ac9c 100w, https://files.realpython.com/media/PyThreadState.20467f3689b7.png 201w" sizes="75vw" alt="PyThreadState structure"/></a></p>
4845 <h4 id="2-constructing-frames">2. Constructing Frames</h4>
4846 <p>The input to <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L716"><code>PyEval_EvalCode()</code></a> and therefore <a href="https://github.com/python/cpython/blob/d93605de7232da5e6a182fd1d5c220639e900159/Python/ceval.c#L4045"><code>_PyEval_EvalCodeWithName()</code></a> has arguments for:</p>
4847 <ul>
4848 <li><strong><code>_co</code>:</strong> a <code>PyCodeObject</code></li>
4849 <li><strong><code>globals</code>:</strong> a <code>PyDict</code> with variable names as keys and their values</li>
4850 <li><strong><code>locals</code>:</strong> a <code>PyDict</code> with variable names as keys and their values</li>
4851 </ul>
4852 <p>The other arguments are optional, and not used for the basic API:</p>
4853 <ul>
4854 <li><strong><code>args</code>:</strong> a <code>PyTuple</code> with positional argument values in order, and <code>argcount</code> for the number of values</li>
4855 <li><strong><code>kwnames</code>:</strong> a list of keyword argument names</li>
4856 <li><strong><code>kwargs</code>:</strong> a list of keyword argument values, and <code>kwcount</code> for the number of them</li>
4857 <li><strong><code>defs</code>:</strong> a list of default values for positional arguments, and <code>defcount</code> for the length</li>
4858 <li><strong><code>kwdefs</code>:</strong> a dictionary with the default values for keyword arguments</li>
4859 <li><strong><code>closure</code>:</strong> a tuple with strings to merge into the code objects <code>co_freevars</code> field</li>
4860 <li><strong><code>name</code>:</strong> the name for this evaluation statement as a string</li>
4861 <li><strong><code>qualname</code>:</strong> the qualified name for this evaluation statement as a string</li>
4862 </ul>
4863 <div class="highlight c"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span>
4864 <span class="nf">_PyEval_EvalCodeWithName</span><span class="p">(</span><span class="n">PyObject</span> <span class="o">*</span><span class="n">_co</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">globals</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">locals</span><span class="p">,</span>
4865            <span class="n">PyObject</span> <span class="o">*</span><span class="k">const</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="n">Py_ssize_t</span> <span class="n">argcount</span><span class="p">,</span>
4866            <span class="n">PyObject</span> <span class="o">*</span><span class="k">const</span> <span class="o">*</span><span class="n">kwnames</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="k">const</span> <span class="o">*</span><span class="n">kwargs</span><span class="p">,</span>
4867            <span class="n">Py_ssize_t</span> <span class="n">kwcount</span><span class="p">,</span> <span class="kt">int</span> <span class="n">kwstep</span><span class="p">,</span>
4868            <span class="n">PyObject</span> <span class="o">*</span><span class="k">const</span> <span class="o">*</span><span class="n">defs</span><span class="p">,</span> <span class="n">Py_ssize_t</span> <span class="n">defcount</span><span class="p">,</span>
4869            <span class="n">PyObject</span> <span class="o">*</span><span class="n">kwdefs</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">closure</span><span class="p">,</span>
4870            <span class="n">PyObject</span> <span class="o">*</span><span class="n">name</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">qualname</span><span class="p">)</span>
4871 <span class="p">{</span>
4872     <span class="p">...</span>
4873 
4874     <span class="n">PyThreadState</span> <span class="o">*</span><span class="n">tstate</span> <span class="o">=</span> <span class="n">_PyThreadState_GET</span><span class="p">();</span>
4875     <span class="n">assert</span><span class="p">(</span><span class="n">tstate</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">);</span>
4876 
4877     <span class="k">if</span> <span class="p">(</span><span class="n">globals</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
4878         <span class="n">_PyErr_SetString</span><span class="p">(</span><span class="n">tstate</span><span class="p">,</span> <span class="n">PyExc_SystemError</span><span class="p">,</span>
4879                          <span class="s">&quot;PyEval_EvalCodeEx: NULL globals&quot;</span><span class="p">);</span>
4880         <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
4881     <span class="p">}</span>
4882 
4883     <span class="cm">/* Create the frame */</span>
4884     <span class="n">f</span> <span class="o">=</span> <span class="n">_PyFrame_New_NoTrack</span><span class="p">(</span><span class="n">tstate</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">globals</span><span class="p">,</span> <span class="n">locals</span><span class="p">);</span>
4885     <span class="k">if</span> <span class="p">(</span><span class="n">f</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
4886         <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
4887     <span class="p">}</span>
4888     <span class="n">fastlocals</span> <span class="o">=</span> <span class="n">f</span><span class="o">-&gt;</span><span class="n">f_localsplus</span><span class="p">;</span>
4889     <span class="n">freevars</span> <span class="o">=</span> <span class="n">f</span><span class="o">-&gt;</span><span class="n">f_localsplus</span> <span class="o">+</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">co_nlocals</span><span class="p">;</span>
4890 </pre></div>
4891 
4892 <h4 id="3-converting-keyword-parameters-to-a-dictionary">3. Converting Keyword Parameters to a Dictionary</h4>
4893 <p>If the function definition contained a <code>**kwargs</code> style catch-all for keyword arguments, then a new dictionary is created, and the values are copied across. The <code>kwargs</code> name is then set as a variable, like in this example:</p>
4894 <div class="highlight python"><pre><span></span><span class="k">def</span> <span class="nf">example</span><span class="p">(</span><span class="n">arg</span><span class="