Konvertieren von Wordpress Artikeln nach JSON
Wenn man seine Blogsoftware wechselt, möchte man natürlich seine alten Postings in das neue System übernehmen. Viele Bloglösungen bieten daher entsprechende Import/Export-Funktionen an. Grav ist aber in erster Linie ein CMS und keine Blogsoftware. Dementsprechend fehlte eine solche Möglichkeit. Die aktuelleren Artikel habe ich, auch um die Funktionen von Grav auszuprobieren, per Hand übernommen und teilweise Artikelserien zu einzelnen Postings zusammengefasst.
Es sind aber noch etwa 60 ältere Artikel übriggeblieben. Alles per Hand zu übernehmen hätte schon einige Zeit gekostet. Daher habe ich mich für eine automatisierte Lösung entschieden.
Zunächst habe ich in der Wordpress-Datenbank einen neuen View angelegt:
CREATE VIEW export AS SELECT `ID`,`post_date`,`post_content`,`post_title` FROM `posts` WHERE `post_status` = 'publish'
Der View sorgt lediglich dafür, dass es eine neue Pseudotabelle export
gibt, die nur die Werte ID
, post_date
, post_content
und post_title
enthält. Diese habe ich dann mit phpmyadmin
als JSON-Datei exportiert. Mit MySQL ab 5.7 kann man auch direkt mit Bordmitteln nach JSON exportieren, aber wer hat schon so modernes Zeugs? ;)
Das Ergebnis sieht in etwa so aus:
{
"posts": [
{"ID":"282","post_date":"2013-02-05 21:14:43","post_content":"Lorem ipsum...","post_title":"WPKG 2: Windows-Softwareverteilung einrichten"},
{"ID":"276","post_date":"2013-02-05 17:20:26","post_content":"Lorem ipsum...","post_title":"about"},
{"ID":"279","post_date":"2013-02-05 19:46:14","post_content":"Lorem ipsum...","post_title":"WPKG 1: Windows-Softwareverteilung mit Linuxmitteln"},
{"ID":"273","post_date":"2013-02-05 13:30:00","post_content":"Lorem ipsum...","post_title":"RAID: Festplatte ersetzen"},
{"ID":"268","post_date":"2013-02-05 08:36:14","post_content":"Lorem ipsum...","post_title":"Twitter: @Sysadm_Borat"},
{"ID":"270","post_date":"2013-02-05 09:21:04","post_content":"Lorem ipsum...","post_title":"SysAdmin-Apps: Nagify \/ OnCall"},
{"ID":"214","post_date":"2013-02-02 22:12:56","post_content":"Lorem ipsum...","post_title":"Nachträgliche Systemdokumentation"},
{"ID":"221","post_date":"2013-02-03 22:58:33","post_content":"Lorem ipsum...","post_title":"CLI Adventures: Konfigdateien ohne Kommentarzeilen"},
{"ID":"232","post_date":"2013-02-04 23:33:52","post_content":"Lorem ipsum...","post_title":"Serverschaden"},
{"ID":"689","post_date":"2013-02-10 15:42:02","post_content":"Lorem ipsum...","post_title":"Samba mit LDAP-Anbindung"},
{"ID":"163","post_date":"2013-01-30 12:13:04","post_content":"Lorem ipsum...","post_title":"Opie: One Time Passwords"},
{"ID":"362","post_date":"2013-02-07 05:03:19","post_content":"Lorem ipsum...","post_title":"E-Mail-Server 1: OpenLDAP (Debian Wheezy)"},
{"ID":"211","post_date":"2013-01-31 23:19:42","post_content":"Lorem ipsum...","post_title":"CLI Adventures: Einfaches E-Mail Backup"},
{"ID":"176","post_date":"2013-01-30 18:20:26","post_content":"Lorem ipsum...","post_title":"sudo: Partielle root-Rechte"},
{"ID":"185","post_date":"2013-01-30 18:44:00","post_content":"Lorem ipsum...","post_title":"CLI Adventures: Pipe Viewer"},
{"ID":"236","post_date":"2013-02-05 01:35:05","post_content":"Lorem ipsum...","post_title":"nagios\/icinga: Festplattenmonitoring"},
{"ID":"193","post_date":"2013-01-30 22:11:22","post_content":"Lorem ipsum...","post_title":"Apache: LDAP-Authentifizierung"},
{"ID":"207","post_date":"2013-01-31 11:58:42","post_content":"Lorem ipsum...","post_title":"nagios\/icinga: Push Notification"},
{"ID":"305","post_date":"2013-02-05 23:32:56","post_content":"Lorem ipsum...","post_title":"WPKG 3: wpkgExpress"},
{"ID":"313","post_date":"2013-02-06 00:14:57","post_content":"Lorem ipsum...","post_title":"WPKG 4: AutoIT"},
{"ID":"351","post_date":"2013-02-06 22:24:47","post_content":"Lorem ipsum...","post_title":"Xen 4.1 und LVM: Installation und Konfiguration (Debian Wheezy)"},
{"ID":"370","post_date":"2013-02-07 06:42:31","post_content":"Lorem ipsum...","post_title":"E-Mail Server 2: Dovecot (Debian Wheezy)"},
{"ID":"374","post_date":"2013-02-07 07:21:24","post_content":"Lorem ipsum...","post_title":"E-Mail-Server 3: postfix, greylisting, amavis (Debian Wheezy)"},
{"ID":"378","post_date":"2013-02-07 07:42:16","post_content":"Lorem ipsum...","post_title":"E-Mail-Server 4: postgrey, spamassassin, clamav, postfix-filter"},
{"ID":"391","post_date":"2013-02-07 09:57:48","post_content":"Lorem ipsum...","post_title":"Sichere Mailkommunikation mit mTLS"},
{"ID":"394","post_date":"2013-02-07 15:43:04","post_content":"Lorem ipsum...","post_title":"E-Mail-Server 5: Backup-MX"},
{"ID":"401","post_date":"2013-02-07 13:09:48","post_content":"Lorem ipsum...","post_title":"QNAP: munin-lite auf QNAP NAS Geräten"},
{"ID":"404","post_date":"2013-02-07 14:29:10","post_content":"Lorem ipsum...","post_title":"Debian Remote Installer"},
{"ID":"606","post_date":"2013-02-09 17:42:49","post_content":"Lorem ipsum...","post_title":"E-Mail-Server 6: Dovecot-Mailquota"},
{"ID":"438","post_date":"2013-02-07 20:19:51","post_content":"Lorem ipsum...","post_title":"MySQL-Replication"},
{"ID":"444","post_date":"2013-02-07 20:42:16","post_content":"Lorem ipsum...","post_title":"NFS: Dateifreigabe mit Linux"},
{"ID":"446","post_date":"2013-02-07 21:42:21","post_content":"Lorem ipsum...","post_title":"mdadm RAID \/ luksCrypt"},
{"ID":"471","post_date":"2013-02-07 22:07:04","post_content":"Lorem ipsum...","post_title":"fail2ban: Unblock"},
{"ID":"490","post_date":"2013-02-08 12:35:16","post_content":"Lorem ipsum...","post_title":"knockd und iptables"},
{"ID":"492","post_date":"2013-02-08 13:23:52","post_content":"Lorem ipsum...","post_title":"CLI-Adventures: find"},
{"ID":"494","post_date":"2013-02-08 12:46:12","post_content":"Lorem ipsum...","post_title":"MacOS X: midnight commander"},
{"ID":"498","post_date":"2013-02-08 13:35:49","post_content":"Lorem ipsum...","post_title":"cron: runas_cron.sh"},
{"ID":"516","post_date":"2013-02-08 18:34:17","post_content":"Lorem ipsum...","post_title":"howto"},
{"ID":"534","post_date":"2013-02-08 20:26:03","post_content":"Lorem ipsum...","post_title":"XMPP: OpenFire Jabber-Server"},
{"ID":"540","post_date":"2013-02-08 21:42:05","post_content":"Lorem ipsum...","post_title":"VoIP-Server: mumble"},
{"ID":"549","post_date":"2013-02-08 22:13:45","post_content":"Lorem ipsum...","post_title":"E-Mail: Sieve-Filter"},
{"ID":"551","post_date":"2013-02-08 22:16:35","post_content":"Lorem ipsum...","post_title":"Terminal.app: Keyboard Mappings"},
{"ID":"553","post_date":"2013-02-08 22:24:08","post_content":"Lorem ipsum...","post_title":"RCS (Revision Control System) "},
{"ID":"574","post_date":"2013-02-09 07:49:52","post_content":"Lorem ipsum...","post_title":"Netzwerk: Bonding (Port Trunking)"},
{"ID":"576","post_date":"2013-02-09 11:42:03","post_content":"Lorem ipsum...","post_title":"Sicherheit: Mit DenyHosts SSL-Angriffe abwehren"},
{"ID":"753","post_date":"2013-02-18 14:33:24","post_content":"Lorem ipsum...","post_title":"LDAP, Linux und Außendienstnotebooks"},
{"ID":"582","post_date":"2013-02-09 15:23:56","post_content":"Lorem ipsum...","post_title":"bash: Eine praktische Einführung"},
{"ID":"597","post_date":"2013-02-09 10:15:35","post_content":"Lorem ipsum...","post_title":"Linux: Using exFAT"},
{"ID":"599","post_date":"2013-02-09 11:20:00","post_content":"Lorem ipsum...","post_title":"Samba: Shares mounten unter Linux"},
{"ID":"602","post_date":"2013-02-09 14:21:28","post_content":"Lorem ipsum...","post_title":"PHP: HandBrakeCLI-Konverterscript"},
{"ID":"604","post_date":"2013-02-09 16:23:48","post_content":"Lorem ipsum...","post_title":"Linux: IPv6 Network-Settings"},
{"ID":"610","post_date":"2013-02-09 21:23:25","post_content":"Lorem ipsum...","post_title":"E-Mail-Server: RoundCube-Webmailer"},
{"ID":"612","post_date":"2013-02-09 20:15:37","post_content":"Lorem ipsum...","post_title":"Nagios\/Icinga: Serverüberwachung einrichten"},
{"ID":"618","post_date":"2013-02-09 22:10:01","post_content":"Lorem ipsum...","post_title":"SFTP\/SCP-Server \/ Jail"},
{"ID":"620","post_date":"2013-02-09 12:22:26","post_content":"Lorem ipsum...","post_title":"Firefox: Eigener Sync-Server"},
{"ID":"633","post_date":"2013-02-09 16:26:02","post_content":"Lorem ipsum...","post_title":"Xen: VirtualBox-Images konvertieren"},
{"ID":"645","post_date":"2013-02-09 19:11:18","post_content":"Lorem ipsum...","post_title":"bash: Daten ausgeben mit printf"},
{"ID":"903","post_date":"2013-09-07 10:30:41","post_content":"Lorem ipsum...","post_title":"Neuer Server: Erste Clustervorbereitungen (DRBD)"},
{"ID":"671","post_date":"2013-02-10 11:38:15","post_content":"Lorem ipsum...","post_title":"Chat: IRC Server"},
{"ID":"673","post_date":"2013-02-10 13:42:48","post_content":"Lorem ipsum...","post_title":"VirtualBox"},
]
}
Dort wo Lorem ipsum...
steht befindet sich natürlich der eigentliche Inhalt des Postings. Der Inhalt meiner Wordpress-Artikel war in einem speziellen Format angelegt, welches durch einfaches Suchen und Ersetzen relativ einfach nach Markdown umgewandelt werden konnte.
Da Grav alle Seiten als reine Textdateien im Filesystem ablegt, war es jetzt sehr einfach die entsprechenden Artikel anzulegen, ein kleines Pythonscript genügt:
#!/usr/bin/python
import os
import re
import json
import unicodedata
def slugify(value):
value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore')
value = unicode(re.sub('[^\w\s-]', '', value).strip().lower())
value = unicode(re.sub('[-\s]+', '-', value))
return value
with open('posts.json') as f:
posts = json.load(f)
i = 30
for post in posts['posts']:
dir_name = str(i) + '.' + slugify(post['post_title'])
if not os.path.exists(dir_name):
os.makedirs(dir_name)
i = i + 1
date_time = post['post_date'].split()
date = date_time[0].split('-')
time = date_time[1].split(':')
ff = open(dir_name+'/item.de.md', 'w')
ff.write("---")
ff.write("\n")
ff.write("title: '"+ post['post_title'].encode('utf8') + "'")
ff.write("\n")
ff.write("date: '"+ date[2] +'-'+ date[1] +'-'+ date[0] +' '+ time[0] +':'+ time[1] + "'")
ff.write("\n")
ff.write("taxonomy:")
ff.write("\n")
ff.write(" category:")
ff.write("\n")
ff.write(" - blog")
ff.write("\n")
ff.write(" tag:")
ff.write("\n")
ff.write(" - imported")
ff.write("\n")
ff.write("---")
ff.write("\n")
ff.write(post['post_content'].encode('utf8'))
ff.close()
Das i = 30
ist deswegen enthalten, weil ich bereits 29 Artikel per Hand übernommen hatte, und Grav den Verzeichnissen für die Seiten immer eine fortlaufende Zahl voranstellt (hiermit kann man die Reihenfolge der Seiten festlegen, im Blogbetrieb kann man aber auch einfach nach Datum sortieren).
Ein paar kleinere Nacharbeiten später waren alle Artikel wieder online. Ich hoffe ich habe keine kleinen Macken übersehen, falls doch freue ich mich über jeden Hinweis.