EDIT: I have been using Postgres with PostGIS for a few months now, and I am satisfied.
I need to analyze a few million geocoded records, each of which will have latitude and longitude. These records include data of at least three different types, and I will be trying to see if each set influences the other.
What database is best for the underlying data store for all this data? Here’s my desires:
- I’m familiar with the DBMS. I’m weakest with PostgreSQL, but I am willing to learn if everything else checks out.
- It does well with GIS queries. Google searches suggest that PostgreSQL + PostGIS may be the strongest? At least a lot of products seem to use it. MySql’s Spatial Extensions seem comparatively minimal?
- Low cost. Despite the 10GB DB limit in SQL Server Express 2008 R2, I’m not sure I want to live with this and other limitations of the free version.
- Not antagonistic with Microsoft .NET Framework. Thanks to Connector/Net 6.3.4, MySql works well C# and .NET Framework 4 programs. It fully supports .NET 4’s Entity Framework. I cannot find any noncommercial PostgreSQL equivalent, although I’m not opposed to paying $180 for Devart’s dotConnect for PostgreSQL Professional Edition.
- Compatible with R. It appears all 3 of these can talk with R using ODBC, so may not be an issue.
I’ve already done some development using MySql, but I can change if necessary.
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
I have worked with all three databases and done migrations between them, so hopefully I can still add something to an old post. Ten years ago I was tasked with putting a largish — 450 million spatial objects — dataset from GML to a spatial database. I decided to try out MySQL and Postgis, at the time there was no spatial in SQL Server and we had a small startup atmosphere, so MySQL seemed a good fit. I subsequently was involved in MySQL, I attended/spoke at a couple of conferences and was heavily involved in the beta testing of the more GIS-compliant functions in MySQL that was finally released with version 5.5. I have subsequently been involved with migrating our spatial data to Postgis and our corporate data (with spatial elements) to SQL Server. These are my findings.
1). Stability issues. Over the course of 5 years, we had several database corruptions issues, which could only be fixed by running myismachk on the index file, a process than can take well over 24 hours on a 450 million row table.
2). Until recently only MyISAM tables supported the spatial data type. This means if you want transaction support you are out of luck. InnoDB table type does now support spatial types, but not indexes on them, which given the typical sizes of spatial data sets, isn’t terribly useful. See http://dev.mysql.com/doc/refman/5.0/en/innodb-restrictions.html My experience from going to conferences was that spatial was very much an afterthought — we’ve implemented replication, partitioning, etc, but it doesn’t work with spatial.
EDIT: In the upcoming 5.7.5 release InnoDB will finally support indexes on spatial columns, meaning that ACID, foreign keys and spatial indexes will finally be available in the same engine.
3). The spatial functionality is extremely limited compared to both Postgis and SQL Server spatial. There is still no ST_Union function that acts on an entire geometry field, one of the queries I run most often, ie, you can’t write:
select attribute, ST_Union(geom) from some_table group by some_attribute
which is very useful in a GIS context.
Select ST_Union(geom1, const_geom) from some_table, ie, one of the geometries is a hard-coded constant geometry is a bit limiting in comparison.
4). No support for rasters. Being able to do combined vector-raster analysis within a db is very useful GIS functionality.
5). No support for conversion from one spatial reference system to another.
6). Since acquisistion by Oracle, spatial has really been put on hold.
Overall, to be fair to MySQL it supported our website, WMS and general spatial processing for several years, and was easy to set up. On the downside, data corruption was an issue, and by being forced to use MyISAM tables you are giving up a lot of the benefits of an RDBMS.
Given the issues we had with MySQL, we ultimately converted to Postgis. The key points of this experience have been.
1). Extreme stability. No data corruption in 5 years and we now have around 25 Postgres/GIS boxes on centos virtual machines, under varying degrees of load.
2). Rapid pace of development — raster, topology, 3D support being recent examples of this.
3). Very active community. The Postgis irc channel and mailing list are excellent resources. The Postgis reference manual is also excellent. http://postgis.net/docs/manual-2.0/
4). Plays very well with other applications, under the OSGeo umbrella, such as GeoServer and GDAL.
5). Stored procedures can be written in many languages, apart from the default plpgsql, such as Python or R.
5). Postgres is a very standards compliant, fully featured RDBMS, which aims to stay close to the ANSI standards.
6). Support for window functions and recursive queries — not in MySQL, but in SQL Server. This has made writing more complex spatial queries cleaner.
I have only used SQL Server 2008 spatial functionality, and many of the annoyances of that release — lack of support for conversions from one CRS to another, the need to add your own parameters to spatial indexes — have now been resolved.
1). As spatial objects in SQL Server are basically CLR objects, the syntax feels backwards. Instead of ST_Area(geom) you write geom.STArea() and this becomes even more obvious when you chain functions together. The dropping of the underscore in function names is merely a minor annoyance.
2). I have had a number of invalid polygons that have been accepted by SQL Server, and the lack of a ST_MakeValid function can make this a bit painful.
3). Windows only. In general, Microsoft products (like ESRI ones) are designed to work very well with each other, but don’t always have standard’s compliance and interoperability as primary objectives. If you are running a windows only shop, this is not an issue.
UPDATE: having played a bit with SQL Server 2012, I can say that it has been improved significantly. There is now a good geometry validation function, there is good support for the Geography data type, including a FULL GLOBE object, which allows representing objects that occupy more than one hemisphere and support for Compound Curves and Circular Strings which is useful for accurate and compact representations of arcs (and circles) among other things. Transforming coordinates from one CRS to another still needs to be done in 3rd party libraries, though this is not a show stopper in most applications.
I haven’t used SQL Server with large enough datasets to compare one on one with Postgis/MySQL, but from what I have seen the functions behave correctly, and while not quite as fully featured as Postgis, it is a huge improvement on MySQL’s offerings.
Sorry for such a long answer, I hope some of the pain and joy I have suffered over the years might be of help to someone.
If you are interested in a thorough comparison, I recommend “Cross Compare SQL Server 2008 Spatial, PostgreSQL/PostGIS 1.3-1.4, MySQL 5-6” and/or “Compare SQL Server 2008 R2, Oracle 11G R2, PostgreSQL/PostGIS 1.5 Spatial Features” by Boston GIS.
Considering your points:
- I’m familiar with the DBMS: setting up a PostGIS database on Windows is easy, using PgAdmin3 management is straight-forward too
- It does well with GIS queries: PostGIS is definitely strongest of the three, only Oracle Spatial would be comparable but is disqualified if you consider its costs
- Low cost: +1 for PostGIS for sure
- Not antagonistic with Microsoft .NET Framework: You should at least be able to connect via ODBC (see Postgres wiki)
- Compatible with R: shouldn’t be a problem with any of the three
PostGis definitely. Here’s why.
- Postgres is far superior to MySQL in performance. Server is more fault tolerant, has out of the box tools for load-balancing, caching and optimization.
- PostGIS is becoming a standard in GIS apps.
- It’s free.
Just an note that MySQL has finally added in proper GIS logic.
But I can’t comment on cost or performance at this stage
PostGIS is best because it is becoming a standard in GIS applications these days and PostGIS is free. It is far superior to MySQL in performance