A long time ago on a system far, far away…
Trying to migrate a database from MySQL to PostgreSQL. All the documentation I have read covers, in great detail, how to migrate the structure. I have found very little documentation on migrating the data. The schema has 13 tables (which have been migrated successfully) and 9 GB of data.
MySQL version: 5.1.x
PostgreSQL version: 8.4.x
I want to use the R programming language to analyze the data using SQL select statements; PostgreSQL has PL/R, but MySQL has nothing (as far as I can tell).
A New Hope
Create the database location (
/var has insufficient space; also dislike having the PostgreSQL version number everywhere — upgrading would break scripts!):
sudo mkdir -p /home/postgres/main
sudo cp -Rp /var/lib/postgresql/8.4/main /home/postgres
sudo chown -R postgres.postgres /home/postgres
sudo chmod -R 700 /home/postgres
sudo usermod -d /home/postgres/ postgres
All good to here. Next, restart the server and configure the database using these installation instructions:
sudo apt-get install postgresql pgadmin3
sudo /etc/init.d/postgresql-8.4 stop
sudo vi /etc/postgresql/8.4/main/postgresql.conf
sudo /etc/init.d/postgresql-8.4 start
sudo -u postgres psql postgres
sudo -u postgres createdb climate
pgadmin3 to configure the database and create a schema.
The episode continues in a remote shell known as
bash, with both databases running, and the installation of a set of tools with a rather unusual logo: SQL Fairy.
sudo make install
sudo apt-get install perl-doc(strangely, it is not called
Extract a PostgreSQL-friendly DDL and all the
sqlt -f DBI --dsn dbi:mysql:climate --db-user user --db-password password -t PostgreSQL > climate-pg-ddl.sql
climate-pg-ddl.sqland convert the identifiers to lowercase, and insert the schema reference (using VIM):
:%s/ TABLE / TABLE climate./g
:%s/ on / on climate./g
mysqldump --skip-add-locks --complete-insert --no-create-db --no-create-info --quick --result-file="climate-my.sql" --databases climate --skip-comments -u root -p
It might be worthwhile to simply rename the tables and columns in MySQL to lowercase:
select concat( 'RENAME TABLE climate.', TABLE_NAME, ' to climate.', lower(TABLE_NAME), ';' ) from INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA='climate';
- Execute the commands from the previous step.
- There is probably a way to do the same for columns; I changed them manually because it was faster than figuring out how to write the query.
The Database Strikes Back
Recreate the structure in PostgreSQL as follows:
pgadmin3(switch to it)
- Click the Execute arbitrary SQL queries icon
- Search for
TABLE "replace with
TABLE climate."(insert the schema name
- Search for
on "replace with
on climate."(insert the schema name
This results in:
Query returned successfully with no result in 122 ms.
Replies of the Jedi
At this point I am stumped.
- Where do I go from here (what are the steps) to convert
climate-pg.sqlso that they can be executed against PostgreSQL?
- How to I make sure the indexes are copied over correctly (to maintain referential integrity; I don’t have constraints at the moment to ease the transition)?
- How do I ensure that adding new rows in PostgreSQL will start enumerating from the index of the last row inserted (and not conflict with an existing primary key from the sequence)?
- How do you ensure the schema name comes through when transforming the data from MySQL to PostgreSQL inserts?
A fair bit of information was needed to get this far:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
What I usually do for such migrations is two-fold:
- Extract the whole database definition from MySQL and adapt it to PostgreSQL syntax.
- Go over the database definition and transform it to take advantage of functionality in PostgreSQL that doesn’t exist in MySQL.
Then do the conversion, and write a program in whatever language you are most comfortable with that accomplishes the following:
- Reads the data from the MySQL database.
- Performs whatever transformation is necessary on the data to be stored in the PostgreSQL database.
- Saves the now-transformed data in the PostgreSQL database.
Redesign the tables for PostgreSQL to take advantage of its features.
If you just do something like use a
sed script to convert the SQL dump from one format to the next, all you are doing is putting a MySQL database in a PostgreSQL server. You can do that, and there will still be some benefit from doing so, but if you’re going to migrate, migrate fully.
It will involve a little bit more up-front time spent, but I have yet to come across a situation where it isn’t worth it.
Convert the mysqldump file to a PostgreSQL-friendly format
Convert the data as follows (do not use mysql2pgsql.perl):
Escape the quotes.
sed "s/\'/''/g" climate-my.sql | sed "s/\r/r/g" | sed "s/\n/n/g" > escaped-my.sql
USE "climate";with a search path and comment the comments:
sed "s/USE "climate";/SET search_path TO climate;/g" escaped-my.sql | sed "s/^/*/--/" > climate-pg.sql
Connect to the database.
sudo su - postgres
Set the encoding (mysqldump ignores its encoding parameter) and then execute the script.
This series of steps will probably not work for complex databases with many mixed types. However, it works for
Indexes, primary keys, and sequences
mysqldump included the primary keys when generating the
INSERT statements, they will trump the table’s automatic sequence. The sequences for all tables remained 1 upon inspection.
Set the sequence after import
ALTER SEQUENCE command will set them to whatever value is needed.
There is no need to prefix tables with the schema name. Use:
SET search_path TO climate;
If you’ve converted a schema then migrating data would be the easy part:
dump schema from PostgreSQL (you said that you’ve converted schema to postgres, so we will dump it for now, as we will be deleting and recreating target database, to have it cleaned):
pg_dump dbname > /tmp/dbname-schema.sql
split schema to 2 parts —
/tmp/dbname-schema-1.sqlcontaining create table statements,
/tmp/dbname-schema-2.sql— the rest. PostgreSQL needs to import data before foreign keys, triggers etc. are imported, but after table definitions are imported.
recreate database with only 1 part of schema:
drop database dbname create database dbname i /tmp/dbname-schema-1.sql -- now we have tables without data, triggers, foreign keys etc.
( echo 'start transaction'; mysqldump --skip-quote-names dbname | grep ^INSERT; echo 'commit' ) | psql dbname -- now we have tables with data, but without triggers, foreign keys etc.
--skip-quote-namesoption is added in MySQL 5.1.3, so if you have older version, then install newer mysql temporarily in
configure --prefix=/tmp/mysql && make installshould do) and use
import the rest of schema:
psql dbname start transaction i /tmp/dbname-schema-2.sql commit -- we're done
pip install etlalchemy
from etlalchemy import ETLAlchemySource, ETLAlchemyTarget # Migrate from MySQL to PostgreSQL src = ETLAlchemySource("mysql://user:[email protected]/dbname") tgt = ETLAlchemyTarget("postgresql://user:[email protected]/dbname", drop_database=True) tgt.addSource(src) tgt.migrate()