PyHP

A Python HTML Preprocessor

User and Programmer Manual

Copyright (C) 2003 by David McNab
david at freenet dot org dot nz
Released under the GNU General Public License
(pyWeb homepage is http://www.freenet.org.nz/python/pyweb)

Introduction | Usage | Environment | Examples | Security

Return to main PyWeb Documentation

pywebserver - a real live http server using pyWeb and PyHP




0. Introduction

0.1 Why Another Damn Preprocessor?

Because PHP sucks, Python rules, and I wanted an easy HTML templating system with some of PHP's redeeming features that can work in seamlessly with the PyWeb framework.

0.1. What Does PyHP Do?

If you've ever programmed in PHP, you'll already have the general feel.

PyHP allows you to embed python code into HTML files, in a way which works comfortably with graphical HTML editors like Mozilla Composer.

0.2 Philosophy

I looked at a lot of active web frameworks, particularly python-based ones. While some of them made good attempts to separate presentation from logic, they made the HTML code subordinate to the python code. Or, the markup syntax was plain ugly (sorry, Spyce people - I just can't get into enclosing all python lines in double-brackets.

PyHP lets you design your websites visually and top-down.

Typically you'd use your favourite HTML editor (be it Mozilla Composer, Dreamweaver, Emacs/vi, Notepad, cat, dd,...) and create the visual appearance of your page.

Then you just stick in markups to add any stuff that can vary - your dynamic content.

So, the visual design drives the whole process, and the python code is subordinate. As you create nice re-usable bits of code, you would ideally factor it out and put it into a snippets directory, where it can be shared between any number of pages.

Therefore, you have the freedom to think with your right brain, and keep a complete grasp of the big picture, without constantly tripping over the menial i-dotting and t-crossing of coding issues.




1. Usage

1.1 Quickstart

Let's look at a very minimal PyHP-powered web page.

As you can see below, Python code is marked up with <!--python ... --> comment blocks.

Here we go:
#!/usr/bin/env pyhp
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>

<head>
<title>My first PyHP page</title>
</head>
<body>
<h1>My First PhHP Page</h1>
Below should appear some python output.<br>
<!--python
print '<div style="color:green">Hi from Python!</div>'
-->
And here we are back in regular HTML
</body>
</html>

Ok, let's break it down:
So, what does this produce? Something like:

My First PhHP Page

Below should appear some python output.
Hi from Python!
And here we are back in regular HTML

1.2 Basic Principles

Firstly, by convention, PyHP files are named with the suffix .py.html. This firstly indicates that it's a PyHP file, and secondly it has .html as its final suffix, without which, certain idiot-savant editors like Mozilla Composer will moronically refuse to open the file for visual editing.

Secondly, PyHP files can either run as CGI scripts, or directly under webservers which support it. (Presently, I'm using Python base classes to run my own webserver which runs PyHP and avoids the overhead of loading Python VMs with every hit. But all going well, I (or someone else) will write a mod_pyhp.c extension for Apache).

Thirdly, Python code in PyHP files can call an include() function to load/execute other PyHP files, and even pass arguments to these included files. There is no effective limit to such nesting, and there is no Python startup overhead.

Fourthly, the python code within PyHP scripts inherits a rich namespace of powerful objects, classes and variables. These make it intuitive and convenient to do stuff like:
Needless to say, your Python code can import any standard (or your own) python modules to leverage its power.

Lastly, your Python code gets executed in the context of the function pyweb.pyhp.preprocess(), inheriting its globals and locals.

Again, you can use PyHP to build up (and share) your own collection of rich, reusable templates and code snippets.

1.3 Active Mechanisms

There are several ways of triggering the execution of Python code.

From within the HTML code in preprocessed HTML files, there are two techniques:
From within python code, you have the options of:
NOTE ON STORING VALUES

Often, you may want to lump all python code together in one place, store results in variables, then reference those variables in your HTML code with $(varname}$ markups.

When resolving a ${something}$ markup, PyHP first tries to retrieve something as an attribute of page.localdata. But if page.localdata doesn't have such an attribute, PyHP then tries to evaluate something as a python expression.

In your python code, don't assume that any variable you write to will survive past the end of the python code segment. It most likely won't be available in the next python segment. If you need a piece of data to survive, you must write it as an attribute of page.localdata.

1.4 Installation Checklist

If running your PyHP script under CGI, you'll need to check that all the following have been done:




2. Python Execution Environment

2.1 Overview

Your Python code within PyHP pages is executed in the context of the function pyweb.pyhp.preprocess(). The method of execution is the python exec statement.

The embedded python code inherits a namespace with large collection of local and global symbols.

Yes - I know - the namespace is pretty polluted. But if you refrain from assigning to any symbols prefixed by an underscore, eg _raw, you should be safe.

We will list the most important symbols here, and give some idea on how to use them.

2.1  Session Symbols

(Or, "What kind of stupid rubbish are you polluting my namespace with?!)

The main symbols of interest are:
For info on what these objects are and how to use them, read the pyWeb manual, Section 2, Document Model

2.2 Nesting PyHP Scripts

Yes, of course you can nest PyHP scripts within PyHP scripts. You can even pass arguments to whatever you're nesting.

To nest content, you can use one of the functions:
There are also variants of these functions - includeAdd(), includeAddBody() and includeAddRaw(), which take the same arguments as above. The only difference is that instead of returning the preprocessed content as tag objects, they call add() to add the content immediately to the page.

As you can see, you can pass in arguments and keywords, to be made available to any python code found within the included script. Within this included script, the non-keyword arguments are available in the tuple args, with any keyword arguments given in the dict kw, both sympols of which are available in the local namespace.

Hint - you can make includeAdd*() calls, even python execfile() calls, within ${ ... }$ markups within HTML code, eg:
<!--python
...
column1 = myobject.getsomecontent()
...
-->
<table>
<tr>
<td>${ column1 }$</td>
<td>${ includeAddBody("fred.py.html") }$</td>
<td>${ execfile("mary.py") }$</td>
</tr>
</table>



2.3 Different Ways of Invoking Python Code

Let's recap here the different ways of causing the execution of python code. We will recap what we've already discussed.






3. Examples

I'll get around to this real soon now. I promise :)



4. Security

All active web server frameworks have security risks associated with them.

By definition, all active websites can execute arbitrary code, or else they wouldn't be active. The issue is making sure the server can only execute code which you, as the site owner, want to be executed.

One of the major classes of security risk is the potential for a malicious user to trick the server into executing arbitrary code of the user's own choice. Hackers can leverage this hole to get increasing levels of access to your system, to the point of installing rootkits and taking remote control of your network, quite possibly without you even knowing!

I will explain here how to trick PyHP into executing arbitrary code. Read and understand, and you'll know what to do to protect against this devastating class of vulnerabilities.

To trick a PyHP-enabled server into executing arbitrary code, you must find a way to get the server to write your code to a physical file that resides somewhere in the server's document tree. Once this code is in place, you then send a request to the server with a URL that causes preprocessing of the tainted file. This is basic bread-and-butter hacking, practised for almost as long as active websites have been in existence.

Arguably, this exploit is harder with PyHP. The code which allows for execution of arbitrary Python statements and expressions exists only within the function pyweb.pyhp.preprocess(). As you'll see from reading the source, the only way that pyweb.pyhp.preprocess() can be invoked is when the content being preprocessed is read from a physical file that resides with the server's document root.

At time of writing this, I am aware of nothing within the pyWeb/PyHP code itself that in itself allows for a user to write arbitrary data to a file on the document tree.

You, as the website programmer, are free to write script within your PyHP pages and/or within your CGI files that allows user form input to be written to files on the document tree. But you are also free to park your car on a dark street in a bad part of town with the windows open, and keys and valuables on the driver seat.

Basic rules of thumb:
Please, take care!