TL;DR; I forked collective.transmogrifier into just transmogrifier(not released yet) to make its core usable without Plone dependencies, using Chameleon for TAL-expressions, installable with just pip install and also to be compatible with Python 3.
Transmogrifier is one of the many great developer tools by the Plonecommunity. It's a generic pipeline tool for data manipulation, configurable with plain text INI-files, while new re-usable pipeline section blueprints can be implemented in Python. It coudl be used to process any number of things, but historically it's been mainly developed and used as a pluggable way to import legacy content into Plone.
A simple transmogrifie pipeline for dumping news from Slashdot to a CSV files could look like:
[transmogrifier]
pipeline=
from_rss
to_csv
[from_rss]
blueprint=transmogrifier.from_expression
modules=feedparser
expression=python:modules['feedparser'].parse(options['url']).get('entries', [])
url=http://rss.slashdot.org/slashdot/slashdot
[to_csv]
blueprint=transmogrifier.to_csv
fieldnames=
title
link
filename=slashdot.csv
Actually, I've yet to do any Plone migrations using transmogrifier. But when we recently had a reasonable size non-Plone migration task, I knew not to re-invent the wheel, but to transmogrify it. And we succeeded. Transmogrifier pipeline helped us to design the migration better, and splitting data processing into multiple pipeline sections helped to delegate the work for multiple developers.
Unfortunately, currently collective.transmogrifier has unnecessary dependencies on CMFCore, was not installable without long known good set of versions and was completely missing any command line interface. At first, I tried to do all the necessary refactoring inside collective.transmogrifier, but eventually a fork was required to make the transmogrifier core usable outside Plone-environments.
So, meet the new transmogrifier:
- can be installed with pip install (although, not yet released at PyPI)
- new mr.migrator inspired command-line interface transmogrify --help
- new base classes for custom blueprints
- transmogrifier.blueprints.Blueprint
- transmogrifier.blueprints.ConditionalBlueprint
- new ZCML-directives for registering blueprints and re-usable pipelines
- <transmogrifier:blueprint component=""name="" />
- <transmogrifier:pipeline id=""name=""description=""configuration="" />
- uses Chameleon for TAL-expressions (e.g. in ConditionalBlueprint)
- has only a few generic built-in blueprints
- supports z3c.autoinclude for package transmogrifier
- fully backwards compatible with blueprints for collective.transmogrifier
- runs with Python >= 2.6, including Python 3+
There's still much work to do before a real release (at least documenting and testing the new CLI-script and new built-in blueprints), but let's still see how it works...
Example pipeline
Let's start with an easy installation
$ pip install git+https://github.com/datakurre/transmogrifier
$ transmogrify --help
Usage: transmogrify <pipelines_and_overrides>...
[--overrides=<path/to/pipeline/overrides.cfg>]
[--context=<path.to.context.factory>]
transmogrify --list
transmogrify --show=<pipeline>
and with example filesystem pipeline.cfg
[transmogrifier]
pipeline=
from_rss
to_csv
[from_rss]
blueprint=transmogrifier.from_expression
modules=feedparser
expression=python:modules['feedparser'].parse(options['url']).get('entries', [])
url=http://rss.slashdot.org/slashdot/slashdot
[to_csv]
blueprint=transmogrifier.to_csv
fieldnames=
title
link
filename=slashdot.csv
and its dependencies
$ pip install feedparser
and the results
$ transmogrify pipeline.cfg
INFO:transmogrifier:CSVConstructor:to_csv saved 25 items to /.../slashdot.csv
using Python 3.
Example migration project
Let's create a migration project with custom blueprints using Python 3.
In addition to transmogrifier, we need also z3c.autoinclude (patched for Python 3) and venusianconfiguration for easy blueprint registration:
$ pip install git+https://github.com/datakurre/transmogrifier
$ pip install git+https://github.com/datakurre/venusianconfiguration
$ pip install git+https://github.com/datakurre/z3c.autoinclude
Then, our working directory must contain a simple setup.py to declare a package for our custom blueprints:
fromsetuptoolsimportsetup,find_packages
setup(
name='blueprints',
packages=find_packages(exclude=['ez_setup']),
install_requires=[
'setuptools',
'transmogrifier',
'venusianconfiguration',
'fake-factory'
],
entry_points="""
# -*- Entry points: -*-
[z3c.autoinclude.plugin]
target = transmogrifier
"""
)
Finally, we must create a sub-folder for our Python modules.
$ mkdir blueprints
$ touch blueprints/__init__.py
$ touch blueprints/configure.py
And register them to our python (virtualenv recommended):
$ python setup.py develop
Now, we can register custom blueprints in blueprints/configure.py
fromvenusianconfigurationimportconfigure
fromtransmogrifier.blueprintsimportBlueprint
fromfakerimportFaker
@configure.transmogrifier.blueprint.component(name='faker_contacts')
classFakerContacts(Blueprint):
def__iter__(self):
foriteminself.previous:
yielditem
amount=int(self.options.get('amount','0'))
fake=Faker()
foriinrange(amount):
yield{
'name':fake.name(),
'address':fake.address()
}
and see it registered
$ transmogrify --list
Available blueprints
--------------------
faker_contacts
...
and make an example pipeline.cfg
[transmogrifier]
pipeline=
from_faker
to_csv
[from_faker]
blueprint=faker_contacts
amount=2
[to_csv]
blueprint=transmogrifier.to_csv
and enjoy the results
$ transmogrify pipeline.cfg to_csv:filename=-
address,name
"534 Hintz Inlet Apt. 804
Schneiderchester, MI 55300",Dr. Garland Wyman
"44608 Volkman Islands
Maryleefurt, AK 42163",Mrs. Franc Price DVM
INFO:transmogrifier:CSVConstructor:to_csv saved 2 items to -
Mandatory example with Plone
Using the new transmogrifier with Plone should be as simply as adding it into your buildout.cfg next to the old transmogrifier packages:
[buildout]
extends=http://dist.plone.org/release/4.3-latest/versions.cfg
parts=instance
versions=versions
extensions=mr.developer
soures=sources
auto-checkout=*
[sources]
transmogrifier=git https://github.com/datakurre/transmogrifier
[instance]
recipe=plone.recipe.zope2instance
eggs=
Plone
z3c.pt
transmogrifier
collective.transmogrifier
plone.app.transmogrifier
user=admin:admin
zcml=plone.app.transmogrifier
[versions]
setuptools=
zc.buildout=
Let's also write a fictional migration pipeline, which would create Plone content from Slashdot RSS-feed:
[transmogrifier]
pipeline=
from_rss
id
fields
folders
create
update
commit
[from_rss]
blueprint=transmogrifier.from_expression
modules=feedparser
expression=python:modules['feedparser'].parse(options['url']).get('entries', [])
url=http://rss.slashdot.org/Slashdot/slashdot
[id]
blueprint=transmogrifier.expression
modules=uuid
id=python:str(modules['uuid'].uuid4())
[fields]
blueprint=transmogrifier.expression
portal_type=string:Document
text=path:item/summary
_path=string:slashdot/${item['id']}
[folders]
blueprint=collective.transmogrifier.sections.folders
[create]
blueprint=collective.transmogrifier.sections.constructor
[update]
blueprint=plone.app.transmogrifier.atschemaupdater
[commit]
blueprint=transmogrifier.from_expression
modules=transaction
expression=python:modules['transaction'].commit()
Now, the new CLI-script can be used together with bin/instance -Ositeid runprovided by plone.recipe.zope2instance so that transmogrifier will get your site as its context simply by calling zope.component.hooks.getSite:
$ bin/instance -OPlone run bin/transmogrify pipeline.cfg --context=zope.component.hooks.getSite
With Plone you should, of course, still use Python 2.7.