Google App Engine is a surprisingly powerful and scalable platform, which can cope with even the most demanding Google Wave robots in terms of CPU usage. However, as your robot becomes popular among Wave users, it will face a growing number of requests and it will soon become necessary to make the robot more efficient. This is especially true for robots which perform non-trivial operations when reacting to frequently occurring events such as BlipSubmitted and DocumentChanged. There is a significant risk that a large number of requests will ultimately cause the application running the robot to go over quota.
This post is not about ways to improve the efficiency of your robot. Instead, It is about a possible method to measure the performance of the code being run by your robot, such that you will be able to identify which function calls are slow or resource-consuming. In other words, this post is about profiling, which is an essential process to ensure you squeeze the maximum out of your robot.
You can use the method described here for Python robots. If you have already published your robot, I recommend adding the profiling code to a “staging” version of your robot, which is not deployed to “production”. This is possible with App Engine’s versioning system.
So let’s start. First, add the following import statements to the main module of your application:
import sys import StringIO import cProfile import pstats
The next step is to choose which event handler we wish to profile. For the sake of illustration, let’s choose the event BlipSubmitted. You should normally find the following line of code in the application:
myRobot.register_handler(events.BlipSubmitted, OnBlipSubmitted)
Replace the event handler with a wrapped version of it called OnBlipSubmittedProfiler (you can use other names, of course):
myRobot.register_handler(events.BlipSubmitted, OnBlipSubmittedProfiler)
Then, create the new method as follows:
def OnBlipSubmittedProfiler(event, wavelet):
prof = cProfile.Profile()
prof = prof.runctx('OnBlipSubmitted(event, wavelet)', globals(), locals())
virtual_output = StringIO.StringIO()
sys.stdout = virtual_output
stats = pstats.Stats(prof)
stats.sort_stats("time")
stats.print_stats()
wavelet.reply().append(virtual_output.getvalue())
sys.stdout = sys.__stdout__
In lines 2-3, we construct a new profiler and use it to run the original event handler. In lines 4-5 we use a trick to redirect the standard output to a StringIO, which is required to capture the output of the method print_stats. In lines 6-7 we format the results using the pstats module, sorted by the internal execution time of each call, so we can focus on the slowest functions first. Finally, in line 9, we output the results to a new blip in the wave. You can specify a parameter to the method print_stats, to limit the number of results or to filter them in other ways. For example, print_stats(100) will only return 100 calls. Also, you can log the results instead of printing them in a wave.
Upload the modified application and trigger a BlipSubmitted event. The robot will continue to behave normally but in addition it will also output the profiling results, which will include the total number of calls, the total run-time and a table describing each call in detail. Simple operations should take in total less than 0.01 seconds. You might want to repeat the operation several times or with different input to get a better sense of the amortized run-time of the robot and to take into account factors such as caching.
The following is an example of the output returned by a simple robot:
229 function calls (224 primitive calls) in 0.001 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.001 0.001 /base/data/home/apps/blipmeter/1.341184798852769657/main.py:44(GetWaveStat)
1 0.000 0.000 0.000 0.000 /base/data/home/apps/blipmeter/1.341184798852769657/main.py:34(getMaxBlip)
1 0.000 0.000 0.000 0.000 /base/data/home/apps/blipmeter/1.341184798852769657/waveapi/blip.py:356(_execute)
4 0.000 0.000 0.000 0.000 /base/data/home/apps/blipmeter/1.341184798852769657/waveapi/blip.py:753(serialize)
1 0.000 0.000 0.001 0.001 /base/data/home/apps/blipmeter/1.341184798852769657/main.py:71(OnBlipSubmitted)
7 0.000 0.000 0.000 0.000 /base/data/home/apps/blipmeter/1.341184798852769657/waveapi/blip.py:315(_find)
14 0.000 0.000 0.000 0.000 /base/data/home/apps/blipmeter/1.341184798852769657/waveapi/blip.py:78(serialize)
7 0.000 0.000 0.000 0.000 /base/data/home/apps/blipmeter/1.341184798852769657/waveapi/blip.py:307(_elem_matches)
2 0.000 0.000 0.000 0.000 /base/data/home/apps/blipmeter/1.341184798852769657/waveapi/blip.py:529(__nonzero__)
1 0.000 0.000 0.000 0.000 /base/python_dist/lib/python2.5/logging/__init__.py:216(__init__)
4 0.000 0.000 0.000 0.000 /base/data/home/apps/blipmeter/1.341184798852769657/waveapi/util.py:58(non_none_dict)
4 0.000 0.000 0.000 0.000 /base/data/home/apps/blipmeter/1.341184798852769657/waveapi/blip.py:196(serialize)
2 0.000 0.000 0.000 0.000 /base/data/home/apps/blipmeter/1.341184798852769657/waveapi/blip.py:277(all)
1 0.000 0.000 0.000 0.000 /base/python_dist/lib/python2.5/logging/__init__.py:1132(callHandlers)
1 0.000 0.000 0.000 0.000 /base/python_dist/lib/python2.5/logging/__init__.py:405(format)
1 0.000 0.000 0.000 0.000 /base/python_lib/versions/1/google/appengine/api/app_logging.py:80(_AppLogsMessage)
4 0.000 0.000 0.000 0.000 /base/data/home/apps/blipmeter/1.341184798852769657/waveapi/element.py:88(serialize)
1 0.000 0.000 0.000 0.000 /base/data/home/apps/blipmeter/1.341184798852769657/waveapi/ops.py:199(new_operation)
2 0.000 0.000 0.000 0.000 /base/data/home/apps/blipmeter/1.341184798852769657/waveapi/blip.py:738(first)
15 0.000 0.000 0.000 0.000 {isinstance}
1 0.000 0.000 0.000 0.000 /base/data/home/apps/blipmeter/1.341184798852769657/waveapi/blip.py:535(value)
1 0.000 0.000 0.000 0.000 /base/data/home/apps/blipmeter/1.341184798852769657/waveapi/wavelet.py:277(serialize)
1 0.000 0.000 0.000 0.000 /base/python_dist/lib/python2.5/posixpath.py:74(split)
It is important to remember that this is a way to measure the run-time of the event handler itself, from the point of receiving the initialized wavelet and event objects to the final statement of the handler. All the operations before and after that, as well as the server-work, are not taken into account. For that reason, this measurement does not generally indicate the response time of a robot. Its purpose is to investigate the performance of the robot as an application serving a large number of requests.