Automated content migration - from DokuWiki to Confluence

    Check how we managed the migration of our project documentation using automation, including the pros and cons of manual vs automated migration, as well as automated tools and methodologies that made the whole process of migration seamless.

    Subscribe to our blog

    At AMPLEXOR Netherlands we had been using DokuWiki as our Wiki for the past years. But as our needs evolved, so has our need for a better solution.  As other AMPLEXOR teams were using Confluence, it only made sense for us to start using the same. Confluence presented several advantages:

    1. Confluence software integrates with Jira - our issue tracker. This allows us to provide insights into development work and generate automatic reports, saving us time.
    2. Since the whole group is using Confluence, knowledge is easy to share among the members of the group.
    3. Confluence offers more integrations and custom components than DokuWiki, adding more functionality to our wiki.

    Now the question is how do we get all our current wiki content into Confluence? There is no out of the box solution to do our migration. There is a tool Universal-Wiki-Converter (UWC) by AppFusions which provides a series of conversions powered by regular expressions. The converter transforms a supported wiki page into a Confluence page, retaining its style and layout.

    While this suits our needs, the UWC is only a tool. Thus, the converter has trouble handling specific wiki markup. This means that our DokuWiki pages will need to be tweaked before they can be transformed into Confluence pages. This could be done manually but that would take a tremendous amount of time and resources, since we have a large number of pages. Therefore, we decided to automate the job using a script. This allows us to run a script and have an exported wiki. This scenario also provides some insight into why automation can be very useful.

    After getting the UWC up and running the first issue quickly arose. Our former wiki makes heavy use of hierarchy in its attachments. This is, however, not supported in the UWC. This results in a wiki that has all the text present but not any attachments or links, defeating its whole purpose. Fixing this problem is easier said than done. Our solution, after some proofs of concepts, was to centralize our files - meaning we pull them out of the hierarchy and give them a unique name. This method works but it also means we will have to convert the filenames of all our images and links first. Because of our decision to automate the migration, this step was trivial: had we done the migration by hand, this would take a considerable amount of time. So already our decision to automate the process helped us out here.

    Our plan for the migration was to move all of our old pages to a new space in Confluence and then be able to later reorganize our files. However, during testing we discovered this would work, since moving a page to a different space caused a problem. We could not move a page without breaking all of the links to the page. After some trial and error we found out that this issue disappears once a page is saved. After finding this out we hit the next roadblock. This discovery led us to our next challenge. How do we save all pages without having someone spend a whole day clicking ‘save’? Our eventual answer came from our experience in automated testing. One of the software suite that is commonly used for automated testing is Selenium. This suite can perform various browser interactions. Because it can be controlled from code, it means we can use it to press save on our page and then move to the next page. 

    We work in software development; our wiki reflects this. So it contains a lot of code snippets and examples. After converting our wiki pages, we noticed that weird characters where present at the end of our code snippets. After investigation it turned out that these characters represent a new line, more precisely a carriage return. This was weird since the rest of our wiki does not contain any of these mystery characters. After reading the UWC source code of the UWC, we found it to be a quirk with the conversion process. The UWC DokuWiki converter had a bug in it that turns everything inside a code block into HTML, including our carriage return. The converter is not actively maintained anymore, so we had to create our own fix, which was removing the specific HTML from our pages.

    Conclusion

    We have successfully migrated our entire wiki. The whole process has taken more time and effort than we originally planned, but it was nothing compared to the time we saved by automating the whole process. At the start of the project it seemed excessive to automate all of it, but in the end we were glad we did so. What’s more, we also created automated migration scripts which are available to be reusable for other teams or clients looking to make similar migrations.

    Published on 18/09/18    Last updated on 20/09/18

    #Content Management, #Software Development

    About the author

    Guus Hamm is a Junior Digital Experience Technical Consultant at AMPLEXOR, based in The Netherlands. Specializing in Adobe Experience Manager, Guus supports the design and development of WCMS solutions for clients across industries. He has a focus on automation developed through work in DevOps teams.

    SUBSCRIBE TO OUR BLOG

    Participate in this discussion