0. Introduction
0.1 Why Another Damn Preprocessor?
Because PHP sucks, Python rules, and I
wanted an easy HTML templating system with some of PHP's redeeming
features that can work in seamlessly with the PyWeb framework.
0.1. What Does PyHP Do?
If you've ever programmed in PHP,
you'll already have the general feel.
PyHP allows you to embed python code into HTML files, in a way which
works comfortably with graphical HTML editors like Mozilla Composer.
0.2 Philosophy
I looked at a lot of active web
frameworks, particularly python-based ones. While some of them made
good attempts to separate presentation from logic, they made the HTML
code subordinate to the python code. Or, the markup syntax was plain
ugly (sorry, Spyce people - I just can't get into enclosing all python
lines in double-brackets.
PyHP lets you design your websites visually and top-down.
Typically you'd use your favourite HTML editor (be it Mozilla Composer,
Dreamweaver, Emacs/vi, Notepad, cat, dd,...) and create the visual
appearance of your page.
Then you just stick in markups to add any stuff that can vary - your
dynamic content.
So, the visual design drives the
whole process, and the python code is subordinate. As you create
nice re-usable bits of code, you would ideally factor it out and put it
into a snippets directory, where it can be shared between any number of
pages.
Therefore, you have the freedom to think
with your right brain, and keep a complete grasp of the big picture,
without constantly tripping over the menial i-dotting and t-crossing of
coding issues.
1. Usage
1.1 Quickstart
Let's look at a very minimal
PyHP-powered web page.
As you can see below, Python code is marked up with
<!--python ... --> comment
blocks.
Here we go:
#!/usr/bin/env pyhp
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>My first PyHP page</title>
</head>
<body>
<h1>My First PhHP Page</h1>
Below should appear some python output.<br>
<!--python
print '<div style="color:green">Hi from Python!</div>'
-->
And here we are back in regular HTML
</body>
</html>
Ok, let's break it down:
- The #!/usr/bin/env
pyhp line tells your
web server to use PyHP to run this file. Note that this is only
essential if you are running PhHP directly via CGI.
- The <!--python
and --> are
special HTML comments that tell PyHP to execute what's inside as Python
code.
- the
python code print
'<div style=... should use print
statements to deliver up content which should be delivered.
So, what does this produce? Something like:
My First PhHP Page
Below should appear some python output.
Hi from Python!
And here we are back in regular HTML
1.2 Basic Principles
Firstly, by convention, PyHP files are
named with the suffix
.py.html.
This firstly indicates that it's a PyHP file, and secondly it has
.html as its final suffix, without
which, certain idiot-savant editors like Mozilla Composer will
moronically refuse to open the file for visual editing.
Secondly, PyHP files can either
run
as CGI scripts, or
directly
under webservers which support it. (Presently, I'm using Python base
classes to run my own webserver which runs PyHP and avoids the overhead
of loading Python VMs with every hit. But all going well, I (or someone
else) will write a
mod_pyhp.c
extension for
Apache).
Thirdly, Python code in PyHP files can call an
include() function to load/execute
other PyHP files, and even pass arguments to these included files.
There is no effective limit to such nesting, and there is no Python
startup overhead.
Fourthly, the python code within PyHP scripts
inherits a rich namespace of powerful
objects, classes and variables. These make it intuitive and
convenient to do stuff like:
- read/set cookies
- load/save up to 4k of arbitrary
data from/to the client browser, which is serialised as
SHA1/HMAC-secured (ie, non-user-molestable) cookies
- change the Content-Type
header which gets sent back to the browser. For instance, there's
nothing to stop you from writing code which generates and delivers (on
the fly) a custom tarball, Flash animation, Ogg audio stream, PNG
image...
- read fields which were set by the browser via GET or POST methods
- generate HTML on the fly using the rich suite of pyWeb classes,
using very readable and maintainable code
Needless to say, your Python code can import any standard (or your own)
python modules to leverage its power.
Lastly, your Python code gets executed in the context of the function
pyweb.pyhp.preprocess(), inheriting
its globals and locals.
Again, you can use PyHP to build up (and share) your own collection of
rich, reusable templates and code snippets.
1.3 Active Mechanisms
There are several ways of triggering the execution of Python code.
From within the HTML code in preprocessed HTML files, there are two techniques:
- Putting the python code into an HTML Comment, <!--python ... --> , as mentioned in the example above
- Putting a python expression into a ${ ... }$ markup
From within python code, you have the options of:
- nesting another preprocessible file with calls to the include*() functions (see below)
- using standard python import statements, and invoking imported callables
- calling the python execfile() function
- generating and adding content directly via inline code
- generating content, and writing it as an attribute of page.localdata, to be later referenced with ${...}$ markups back in HTML-space
NOTE ON STORING VALUES
Often, you may want to lump all python code together in one place,
store results in variables, then reference those variables in your HTML
code with
$(varname}$ markups.
When resolving a ${something}$ markup, PyHP first tries to retrieve
something as an attribute of
page.localdata. But if
page.localdata doesn't have such an attribute, PyHP then tries to evaluate
something as a python expression.
In your python code, don't assume that any variable you write to will
survive past the end of the python code segment. It most likely won't
be available in the next python segment. If you need a piece of data to
survive, you
must write it as an attribute of
page.localdata.
1.4 Installation Checklist
If running your PyHP script under CGI,
you'll need to check that all the following have been done:
- The file pyweb/pyhp.py, wherever you have installed pyweb, has
world execute permission
- You have a symlink in /usr/bin/pyhp,
which points to the pyweb/pyhp.py
file
- Your PyHP script has world execute permission
2. Python Execution Environment
2.1 Overview
Your Python code within PyHP pages is
executed in the context of the function pyweb.pyhp.preprocess(). The method
of execution is the python exec
statement.
The embedded python code inherits a namespace with large collection of
local and global symbols.
Yes - I know - the namespace is pretty polluted. But if you refrain
from assigning to any symbols prefixed by an underscore, eg _raw, you should be safe.
We will list the most important symbols here, and give some idea on how
to use them.
2.1 Session Symbols
(Or, "What kind of
stupid rubbish are you polluting my namespace with?!)
The main symbols of interest are:
- page - a pyweb.httpEmpty object, which has
all info regarding the current client session and current web hit
- localdata (shorthand for page.localdata)
- an object in which you may store data as attributes - guaranteeing
that the data will survive through the whole request (http hit)
- session - a pyweb._httpenv object, shorthand for
http.session
- cookies - a
cgi.SimpleCookie object, containing all cookies received from the
browser, and allowing you to set cookies to be sent back to the browser
- env - shorthand for session.env - a dict of http session
fields with keys such as REMOTE_ADDR
- fields - all HTTP fields
sent by browser, whether by POST
or GET, in forms and in the URL
- savedata - a dict-like
object which lets you store up to 4k or more, securely, on the client's
browser
- relpath - the 'pathname'
in the request - ie the argument to the HTTP 'GET' or 'POST' header
- pathbits - the components
of relpath broken down as a list. If the request path is '/', this list will contain just ['']
- add - shorthand for page.add. Call this with zero or
more arguments to add stuff to the page
also,
- args - sequence of
arguments passed to python code (if the code is executed as a result of
an include().
- kw - dict of keywords
passed to python code (if code is executed by an include())
For info on what these objects are and how to use them, read the
pyWeb manual,
Section
2, Document Model
2.2 Nesting PyHP Scripts
Yes, of course you can nest PyHP
scripts within PyHP scripts. You can even pass arguments to whatever
you're nesting.
To nest content, you can use one of the functions:
- include(pathname [, arg1, ...],
[key1=value1, ...])
- reads in a PyHP file, and executes any python code therein
- returns the preprocessed text as a pyWeb tag object
- includeBody(pathname [,
arg1, ...], [key1=value1, ...])
- as for include(),
but ignores everything except what's within the <body...> ...
</body> tags
- includeRaw(string, [, arg1,
...], [key1=value1, ...])
- as for include(),
but accepts a raw string instead of a filename
There are also variants of these functions -
includeAdd(),
includeAddBody() and
includeAddRaw(),
which take the same arguments as above. The only difference is that
instead of returning the preprocessed content as tag objects, they call
add() to add the content immediately to the page.
As you can see, you can pass in arguments and keywords, to be made
available to any python code found within the included script. Within
this included script, the non-keyword arguments are available in the
tuple
args, with any keyword
arguments given in the dict
kw,
both sympols of which are available in the local namespace.
Hint - you can make includeAdd*() calls, even python execfile() calls, within ${ ... }$ markups within HTML code, eg:
<!--python
...
column1 = myobject.getsomecontent()
...
-->
<table>
<tr>
<td>${ column1 }$</td>
<td>${ includeAddBody("fred.py.html") }$</td>
<td>${ execfile("mary.py") }$</td>
</tr>
</table>
2.3 Different Ways of Invoking Python Code
Let's recap here the different ways of causing the execution of python code. We will recap what we've already discussed.
3. Examples
I'll get around to this real soon now. I promise :)
4. Security
All active web server frameworks have security risks associated with them.
By definition, all active websites can execute arbitrary code, or else
they wouldn't be active. The issue is making sure the server can only
execute code which you, as the site owner,
want to be executed.
One of the major classes of security risk is the potential for a
malicious user to trick the server into executing arbitrary code of the
user's own choice. Hackers can leverage this hole to get increasing
levels of access to your system, to the point of installing rootkits
and taking remote control of your network, quite possibly without you
even knowing!
I will explain here
how to
trick PyHP into executing arbitrary code. Read and understand, and
you'll know what to do to protect against this devastating class of
vulnerabilities.
To trick a PyHP-enabled server into executing arbitrary code, you must
find a way to get the server to write your code to a physical file that
resides somewhere in the server's document tree. Once this code is in
place, you then send a request to the server with a URL that causes
preprocessing of the tainted file. This is basic bread-and-butter
hacking, practised for almost as long as active websites have been in
existence.
Arguably, this exploit is harder with PyHP. The code which allows for
execution of arbitrary Python statements and expressions exists only
within the function
pyweb.pyhp.preprocess().
As you'll see from reading the source, the only way that
pyweb.pyhp.preprocess() can be invoked is when the content being
preprocessed is read from a physical file that resides with the
server's document root.
At time of writing this, I am aware of nothing within the pyWeb/PyHP code itself that
in itself allows for a user to write arbitrary data to a file on the document tree.
You, as the website programmer, are free to write script within your
PyHP pages and/or within your CGI files that allows user form input to
be written to files on the document tree. But you are also free to park
your car on a dark street in a bad part of town with the windows open,
and keys and valuables on the driver seat.
Basic rules of thumb:
- If you're storing user-enterable data on-site, never, never, NEVER
store it on a file that resides within the server document tree and can
be directly requested by a GET. Stick it somewhere else, in a database,
or outside the document tree
- If you must break this rule, then ALWAYS vet all HTTP fields containing user data, and flatly reject any input that contains the strings "$[", "]$", "${", "]$". If you are diligent with this, then pyWeb/PyHP should never of itself be a cause of your system being breached.
Please,
take care!