How do we approach content migration into WordPress?
In 2020, the majority of B2B companies already have an online presence. So often when a B2B company approaches ClarityDX for a website design and development, more often than not, they are looking to redesign and rebuild an existing website that will support their marketing efforts and growth.
Some of these B2B websites have archives of content dating back many years, and rightly, most do not want this content disappearing into the ether.
As a result, we incorporate a content migration strategy into the project scope. If you’re wondering exactly how we approach content migration in different scenarios, read on.
As with most projects, we begin this with the most important stage of the project: discovery.
Technology the existing site is built with will often define our content migration approach
First and foremost we need to know what technology the existing site is built with – this greatly influences our approach. Is the site already in WordPress? If not, does it use another content management system? Does it use an SQL database? Is there some sort of export functionality? These factors will define our decision making from the start. We usually ask the client to provide us with details and (if they can) access to their system so we can poke around a bit and get a feel for what we’re dealing with.
How much and what type of content needs to be migrated to the new site?
Next we need to analyse the content and understand what the client would like to move across to the new site. We need to know how much content to move and what kinds of data to expect. The type of content can vary from relatively simple blog or news posts, which contain basic information such as title, date, author, and simple HTML content; to more elaborate pages, which contain downloads, imagery, layouts, styles, taxonomy information, etc. Tools like Screaming Frog are great at helping us crawl an entire site to get visibility of everything.
Once we have an idea of the content and technology we can begin to scope and define our approach.
Manual or automated approach
There are two main approaches: manual and automated. The choice of which mostly depends on the complexity and quantity of content (and often the client’s budget).
A manual migration will involve someone (either ClarityDX or client side) building out the required pages on the new site from scratch, by essentially copying and pasting content from the old website. This is usually done for main ‘pages’ which have varied layouts and elements, and are too complex for an automated migration.
For an automated migration we write a script to loop over the content items and programmatically add them into WordPress as posts or pages. This works really well for simpler content and larger quantities, such as news or blog posts. The type of script we write depends on the technology and database structure of the existing site, which is why the technical discovery at the start of the project is so important.
WordPress to WordPress content migration
The most common and simple migration we run is if the site is already in WordPress. In this case we download the existing database and import it to the same server as the new site. We can then write a bespoke script to run queries on that database and pick out specific bits of post content, meta data, taxonomy data, etc. from any custom post type.
There are limitations of course and a lot of the time we have to make site specific tweaks to the script. For example, because we build our sites using ACF and Gutenberg blocks we can’t support page builders such as Divi and have to strip out the shortcodes used by the plugin.
SQL to WordPress content migration
If the existing site isn’t in WordPress but does have an SQL database the process is very similar to the above. The technologies are still compatible so we import the SQL database onto the server alongside the new site and again write a bespoke script to query the database and pick out the relevant bits of data.
‘Scraping’ content for migration
A more complicated case is if the website isn’t using SQL, doesn’t have any export functionality, or isn’t a CMS at all. In this case we have to ‘scrape’ the HTML content from the URLs.
In one specific such case we wrote a script which absorbs a CSV file with a list of URLs which need to be migrated. The script loops through each URL and fetches the HTML content of the entire page, it then uses PHP’s DOMDocument class to filter the content and pick out the specific elements required to be migrated over and inserts them into WordPress on the fly.
Sitecore to WordPress content migration
In another one of our recent projects, the site was built in Sitecore (which isn’t SQL or even PHP based). However, we managed to get an export of the content as an array in JSON format. We were then able to write a script to loop over the array and insert the data into relevant fields in WordPress.
In fact this script became the groundwork for an all purpose import/export plugin we currently have in development internally. The script aims to remove as much need for custom development as possible by allowing the user to upload a JSON file (and in the future CSV or XML) and map the JSON fields across to the selected post type fields using a drag and drop interface.
The plugin will also have an export functionality which would make it easier for us to migrate existing WordPress sites. The goal is to eventually have an extendible plugin which can handle any automated migration, whether it’s a WordPress site, another CMS, or a site we have to ‘scrape’ for content. Having our own migration plugin will mean we don’t have to rely on any third party or premium plugins (of which there are a lot) and it will allow the team to customise the functionality if need be for a specific project.
Internal day: October 2020
The flexibility of WordPress
Let's Talk
Do you have a web design and build project coming up that you would like to talk about?