| Home | Trees | Index | Help |
|
|---|
| Module pyweb :: Class htmlRipper |
|
ParserBase--+ |HTMLParser--+ | htmlRipper
This is a handy class which supports the use of html files as templates, and is intended to be used in conjunction with the 'template=' constructor keyword for webwidget objects.
Think of it as a fast 'poor-man's DOM'.
With this class, you can prepare an html file with your favourite editor (Mozilla composer, OpenOffice.org, Emacs/vim, cat etc), then pass it when you construct this class.
Methods of this class let you search for content in the file, by tag, id or any attributes, and return that content in its original rendered form.
See the method docstrings for more info| Method Summary | |
|---|---|
__init__(self,
fileOrStr,
**kw)
| |
This allows a dirty shorthand for extracting content by tag, instancenum, id and/or attributes. | |
Lower-level method which returns the nth instance of an entity. | |
Searches for the nth instance of an entity with matching tag name and/or attributes | |
Renders the entity named 'id' (ie, the tag with an attribute 'id' set to id), or returns empty string if no entity found | |
Renders an item back to its raw html item can be an index or an entity dict | |
renders a range of raw items as with getItem, this is probably too low level to be of much use | |
handle_comment(self,
data)
| |
handle_data(self,
data)
| |
handle_decl(self,
data)
| |
handle_endtag(self,
tag)
| |
handle_startendtag(self,
tag,
attr)
| |
handle_starttag(self,
tag,
attrs)
| |
| Inherited from HTMLParser | |
| |
| |
Handle any buffered data. | |
| |
Feed data to the parser. | |
Return full source of start tag: '<...>'. | |
| |
| |
| |
| |
| |
| |
| |
| |
Reset this instance. | |
| |
| |
| |
| Inherited from ParserBase | |
Return current line number and offset. | |
| |
| |
| |
| Class Variable Summary | |
|---|---|
| Inherited from HTMLParser | |
tuple |
CDATA_CONTENT_ELEMENTS = ('script', 'style')
|
| Method Details |
|---|
__getitem__(self,
idx)
|
getEntity(self, entity, contentsOnly=0)Lower-level method which returns the nth instance of an entity. Note that the order of entities is the order in which their tags are closed in the original file. Returns the entity's text fully rendered. The optional 'contentsOnly' argument, if true, causes only the *contents* of the entity, and *not* its opening/closing tags, to be returned. |
getEntityTag(self, tag='', idx=0, **kw)Searches for the nth instance of an entity with matching tag name and/or attributes Arguments:
Note - if the 'tag' argument is prefixed with a hyphen '-', then only the tag's contents are returned - the opening and closing tag are dropped. Note - access to this method is short-handed in the __getitem__ method, which perversely allows the htmlRipper object to be subscripted. See __getitem__ for more info. |
getId(self, id)Renders the entity named 'id' (ie, the tag with an attribute 'id' set to id), or returns empty string if no entity found If found, returns the full text of the original entity, including its opening/closing tags and all content If the id is prefixed with a hyphen '-', the entity's start/end tags are not included - only the contents. |
getItem(self, item)Renders an item back to its raw html item can be an index or an entity dict Note that this only returns the opening tag, so probably won't be much use for clients |
getRange(self, fromidx, toidx)renders a range of raw items as with getItem, this is probably too low level to be of much use |
| Home | Trees | Index | Help |
|
|---|
| Generated by Epydoc 2.0 on Sat Feb 7 20:08:05 2004 | http://epydoc.sf.net |