Why is your Python app so slow? Find out by using Python’s built-in profiler to locate bottlenecks in your Python code Credit: Thinkstock Python may not be the fastest language around, but it is often fast enough. And Python is ideal when programmer time matters more than CPU time. That said, if a given Python app is laggy, you’re not obliged to just suck it up. Tools included with the stock installation of the Python interpreter can provide you with detailed feedback about which parts of your program are slow, and offer some hints about how to speed them up. How to use cProfile The cProfile module gathers statistics about the execution time of a Python program. It can report on anything from the entire app to a single statement or expression. Here is a toy example of how to use cProfile: def add(x,y): x+=str(y) return x def add_2(x,y): if y % 20000 == 0: z=[] for q in range(0,400000): z.append(q) def main(): a=[] for n in range(0,200000): add(a,n) add_2(a,n) if __name__ == '__main__': import cProfile cProfile.run('main()') This example runs the application’s main() function and analyzes the performance of main() and everything main() calls. It is also possible to analyze only a part of a program, but the most common use for starters is to profile the whole program. Run the above example and you will be greeted with something like the following output: IDG What is shown here is a list of all of the function calls made by the program, along with statistics about each: At the top (first line in blue), we see the total number of calls made in the profiled program and the total execution time. You may also see a figure for “primitive calls,” meaning non-recursive calls, or calls made directly to a function that don’t in turn call themselves further down in the call stack. ncalls: Number of calls made. If you see two numbers separated by a slash, the second number is the number of primitive calls for that function. tottime: Total time spent in the function, not including calls to other functions. percall: Average time per call for tottime, derived by taking tottime and dividing it by ncalls. cumtime: Total time spent in the function, including calls to other functions. percall (#2): Average time per call for cumtime (cumtime divided by ncalls). filename:lineno: The file name, line number, and function name for the call in question. How to modify cProfile reports By default, cProfile sorts its output by “standard name,” meaning that it sorts by the text in the far-right column (filename, line number, etc.). The default format is useful if you want a general, top-down report of every single function call for reference. But if you’re trying to get to the bottom of a bottleneck, you will likely want the most time-consuming parts of the program listed first. We can produce these results by invoking cProfile a little differently. Note how the bottom part of the above program can be reworked to sort the statistics by a different column (in this case ncalls): if __name__ == '__main__': import cProfile, pstats profiler = cProfile.Profile() profiler.enable() main() profiler.disable() stats = pstats.Stats(profiler).sort_stats('ncalls') stats.print_stats() The results will look something like this: IDG Here is how all this works: Instead of executing a command by way of cProfile.run(), which is not very flexible, we create a profiling object, profiler. When we want to profile some action, we first call .enable() on the profiler object instance, then run the action, then call .disable(). (This is one way to profile only part of a program.) The pstats module is used to manipulate the results collected by the profiler object and print those results. Combining a profiler object and pstats allows us to manipulate the captured profile data — for instance, to sort the generated statistics differently. In this example, using .sort_stats('ncalls') sorts the statistics by the ncalls column. Other sort options are available. How to use cProfile results for optimization The sort options available for cProfile output allow us to tease out potential performance bottlenecks in a program. ncalls The first and most significant piece of information you can unearth with cProfile is which functions get called most frequently, by way of the ncalls column. In Python, the mere act of making a function call incurs a relatively large amount of overhead. If some function is called repeatedly in a tight loop, even if it isn’t a long-running function, that’s guaranteed to impact performance. In the above example, the function add (and the function add_2) is called repeatedly in a loop. Moving the loop into the add function itself, or inlining the add function entirely, would fix this problem. tottime Another useful statistic details which functions the program spends most of its time executing, by way of the tottime column. In the above example, the add_2 function uses a loop to simulate some expensive computation, which pushes its tottime score to the top. Any function with a high tottime score deserves a close look, especially if it is called many times or in a tight loop. Note that you always need to consider the context in which the function is used. If a function has a high tottime but is only called once — for instance, only when the program starts — it is less likely to be a bottleneck. However, if you’re trying to reduce startup time, you’ll want to know whether a function called at startup is making everything else wait. How to export cProfile data If you want to use cProfile‘s generated statistics in more advanced ways, you can export them to a data file: stats = pstats.Stats(profiler) stats.dump_stats('/path/to/stats_file.dat') This file can be read back in by using the pstats module, then sorted or displayed with pstats. The data can also be re-used by other programs. Two examples: <a href="https://pypi.org/project/pyprof2calltree/" rel="nofollow">pyprof2calltree</a> renders detailed visualizations of the program’s call graph and usage stats from profile data. This article provides a detailed real-world example of its use. <a href="https://github.com/jiffyclub/snakeviz" rel="nofollow">snakeviz</a> also generates visualizations from cProfile data, but uses a different representation for the data — a “sunburst” rather than pyprof2calltree’s “flame” graph. Beyond cProfile for Python profiling cProfile is hardly the only way to profile a Python application. cProfile is certainly one of the most convenient ways, given that it’s bundled with Python. But others deserve attention. One project, <a href="https://github.com/benfred/py-spy" rel="nofollow">py-spy</a>, builds a profile for a Python application by sampling its call activity. py-spy can be used to examine a running Python app without having to stop and restart it, and without having to alter its codebase, so it can be used to profile deployed applications. py-spy also generates some statistics about the overhead incurred by the Python runtime (for instance, garbage collection overhead), which cProfile does not. Related content analysis Azure AI Foundry tools for changes in AI applications Microsoft’s launch of Azure AI Foundry at Ignite 2024 signals a welcome shift from chatbots to agents and to using AI for business process automation. By Simon Bisson Nov 20, 2024 7 mins Microsoft Azure Generative AI Development Tools news Microsoft rebrands Azure AI Studio to Azure AI Foundry The toolkit for building generative AI applications has been packaged with new updates to form the Azure AI Foundry service. By Anirban Ghoshal Nov 19, 2024 4 mins Microsoft Azure Generative AI Development Tools feature 14 great preprocessors for developers who love to code Sometimes it seems like the rules of programming are designed to make coding a chore. Here are 14 ways preprocessors can help make software development fun again. By Peter Wayner Nov 18, 2024 10 mins Development Tools Software Development news JetBrains IDEs ease debugging for Kubernetes apps Version 2024.3 updates to IntelliJ, PyCharm, WebStorm, and other JetBrains IDEs streamline remote debugging of Kubernetes microservices and much more. By Paul Krill Nov 14, 2024 3 mins Integrated Development Environments Java Python Resources Videos