python-beautifulsoup - HTML/XML Parser for Quick-Turnaround Applications Like Screen-Scraping

Property Value
Distribution openSUSE Leap 42.2
Repository openSUSE Network Utilities all
Package name python-beautifulsoup
Package version 3.2.1
Package release 23.2
Package architecture noarch
Package type rpm
Installed size 223.82 KB
Download size 57.00 KB
Official Mirror
Beautiful Soup is a Python HTML/XML parser designed for quick turnaround
projects like screen-scraping. Three features make it powerful:
* Beautiful Soup won't choke if you give it bad markup. It yields a parse tree
that makes approximately as much sense as your original document. This is
usually good enough to collect the data you need and run away
* Beautiful Soup provides a few simple methods and Pythonic idioms for
navigating, searching, and modifying a parse tree: a toolkit for dissecting a
document and extracting what you need. You don't have to create a custom
parser for each application
* Beautiful Soup automatically converts incoming documents to Unicode and
outgoing documents to UTF-8. You don't have to think about encodings, unless
the document doesn't specify an encoding and Beautiful Soup can't autodetect
one. Then you just have to specify the original encoding
Beautiful Soup parses anything you give it, and does the tree traversal stuff
for you. You can tell it "Find all the links", or "Find all the links of class
externalLink", or "Find all the links whose urls match "", or "Find the
table heading that's got bold text, then give me that text."
Valuable data that was once locked up in poorly-designed websites is now within
your reach. Projects that would have taken hours take only minutes with
Beautiful Soup.


Package Version Architecture Repository
python-beautifulsoup-3.2.1-11.1.noarch.rpm 3.2.1 noarch openSUSE Oss
python-beautifulsoup - - -


Name Value
python(abi) = 2.7


Name Value
python-beautifulsoup = 3.2.1-23.2


Type URL
Binary Package python-beautifulsoup-3.2.1-23.2.noarch.rpm
Source Package python-beautifulsoup-3.2.1-23.2.src.rpm

Install Howto

  1. Add the openSUSE Network Utilities repository:
    # zypper addrepo opensuse-network-utilities
  2. Install python-beautifulsoup rpm package:
    # zypper install python-beautifulsoup




2013-07-15 -
- Use upstream URL
- Run testsuite
2013-02-11 -
- Spec file cleanup, should fix 12.1 build
2012-02-21 -
- Update to 3.2.1
* Substitute XML entities for angle brackets and bare ampersands within
strings, not just within attribute values. This prevents a possible
cross-site scripting attack when Beautiful Soup is used to sanitize HTML.
2011-12-09 -
- fix license to be in format
2011-11-25 -
- Update to 3.2.0
- Gave the stable series a higher version number than the unstable series,
to make it very clear which series most people should be using.
- When creating a Tag object, you can specify its attributes as a dict
rather than as a list of 2-tuples.
2010-07-06 -
- fix dates in changelog
2010-04-10 -
- Update to;
- Spec file cleaned with spec-cleaner.
2010-01-08 -
- Update to 3.0.8;
- Building as noarch for openSUSE >= 11.2.
2008-12-09 -
- Update to 3.0.7a
- Release 3.0.7a (2008/07/03)
- Added an import that makes BS work in Python 2.3.
- Release 3.0.7 (2008/06/22)
- Fixed a UnicodeDecodeError when unpickling documents that contain non-ASCII characters.
- Fixed a TypeError that occured in some circumstances when a tag contained no text.
- Jump through hoops to avoid the use of chardet, which can be slow in some circumstances. UTF-8 documents should never trigger the use of chardet.
- Whitespace is preserved inside <pre> and <textarea> tags that contain nothing but whitespace.
- Beautiful Soup can now parse a doctype that's scoped to an XML namespace.
- Update to 3.0.6
- Release 3.0.6 (2008/04/26)
- Added a Tag.decompose() method to disconnect a tree or subset, breaking it into bite-sized pieces for the garbage collecter to collect.
- Got rid of a very old debug line that prevented chardet from working.
- Tag.extract() now returns the tag that was extracted.
- Tag.findNext() now does something with the keyword arguments you pass it instead of dropping them on the floor.
- Fixed a Unicode conversion bug.
- Fixed a bug that garbled some tags when rewriting them.
2007-12-18 -
- Update to 3.0.5:
- Beautiful Soup is now licensed under a BSD-style license
- Soup objects can now be pickled, and copied with copy.deepcopy
- Tag.append now works properly on existing BS objects. (It wasn't originally
intended for outside use, but it can be now.) (Giles Radford)
- Passing in a nonexistent encoding will no longer crash the parser on Python
2.4 (John Nagle)
- Fixed an underlying bug in SGMLParser that thinks ASCII has 255 characters
instead of 127 (John Nagle)
- Entities are converted more consistently to Unicode characters
- Entity references in attribute values are now converted to Unicode
characters when appropriate. Numeric entities are always converted, because
SGMLParser always converts them outside of attribute values
- ALL_ENTITIES happens to just be the XHTML entities, so I renamed it to
- The regular expression for bare ampersands was too loose. In some cases
ampersands were not being escaped. (Sam Ruby?)
- Non-breaking spaces and other special Unicode space characters are no
longer folded to ASCII spaces. (Robert Leftwich)
- Information inside a TEXTAREA tag is now parsed literally, not as HTML
tags. TEXTAREA now works exactly the same way as SCRIPT. (Zephyr Fang)

See Also

Package Description
python-ftputil-2.6-1.3.noarch.rpm High-level FTP client library (virtual filesystem and more)
python-gdata-2.0.18-53.14.noarch.rpm Python library to access data through Google Data APIs
python-gdata-doc-2.0.18-53.14.noarch.rpm Documentation files for python-gdata
python-oauth2-1.9-11.2.noarch.rpm A fully tested, abstract interface to creating OAuth clients and servers
python-oauth2-test-1.9-11.2.noarch.rpm Tests for python-oauth2
python-optcomplete-1.2_devel-29.1.noarch.rpm Automatic Shell Completion Support for Scripts Using Optparse
python-pyinotify-0.9.6-38.1.noarch.rpm Python module for watching filesystems changes
python-simplejson-3.14.0-69.1.x86_64.rpm Extensible JSON encoder/decoder for Python
python-simplejson-test-3.14.0-69.1.x86_64.rpm Tests for python-simplejson
python-tweepy-3.5.0-16.1.noarch.rpm Twitter library for python
python-twitter-1.1-2.2.noarch.rpm A python wrapper around the Twitter API
python-xmpppy-0.4.1-4.1.noarch.rpm Jabber Library for Python
python-youtube-dl-2018.05.01-191.1.noarch.rpm A python module for downloading from video sites for offline watching
python2-net-snmp-5.7.3-22.1.x86_64.rpm The Python 'netsnmp' module for the Net-SNMP
python3-gdata-2.0.18-53.14.noarch.rpm Python library to access data through Google Data APIs