Fun with Beautiful Soup
So how easy is it to yank out all links in a post and display them as a list on their own? Well with a little Python love and some Django filters it’s easy as microwave burritos. Before we get started you’ll need a fresh copy of ElementTree and BeautifulSoup.
ElementTree and BeautifulSoup together form a very handy toolbox for parsing structured data. ElementTree is used a lot for parsing XML and BeautifulSoup is tailored more towards parsing XHTML. Download both of these guys and drop them in your python path so they can be accessed within your Django app.
Once those guys are seated you can then write a very simple templatetag filter.
@register.filter
def get_links(value):
try:
try:
from BeautifulSoup import BeautifulSoup
except ImportError:
from beautifulsoup import BeautifulSoup
soup = BeautifulSoup(value)
return soup.findAll('a')
except ImportError:
if settings.DEBUG:
raise template.TemplateSyntaxError, "Error in 'get_links' filter: BeautifulSoup isn't installed."
return value
Save that into your templattags directory and use it like so in a template:
<ul>
{% for link in object.body|getlinks %}
<li><a href="{{ link.href }}">{{ link.title }}</a></li>
{% endfor %}
</ul>
The above example assumes you’re working with an object that has a field called body which contains simple HTML structured data. The next assumption is that each anchor has a title attribute. If you didn’t want to mess with titles, just say {{ link }} instead. For more on templatetags consume some tasty Django Documentation. Enjoy!
UPDATE: I may have inadvertently implied that you needed ElementTree to use BeautifulSoup. This is in fact wrong. BeautifulSoup can play by itself.
Remarks
sandro
Very cool implementation! After reading this entry I thought “okay… but where’s the implementation?” Then I saw it right there in front of me on the left side, the post got about 10X better after seeing how the implementation improves meta info on your post! I bet this is the same way django documentation works, how the right hand side links to subheading ids. Great!
Adam Spooner
Awesome! Thanks for the snippet Nathan! I had never heard of BeautifulSoup before. I’m going to check into using it in a current project where I’m hoping to extract all the ‘p’ elements from a document.
If you’re taking suggestions on little tutorials like this one, then I’d love to see how you get some of your Flash work to play nicely with Django and Python.
Cheers!
Remarks are closed.
Remarks have been close for this post.