Archive for August, 2006

And since I wrestled in High School

I am going to once again get dirty with the Pig. In writing that last post where I wrote about OS, I read some more of the creating passionate users blog and came across a great post. Dmitri (and Manifold as his employer) need to read this post. I am now officially hesitant to use Manifold because of the outrageous statments and wrong information Dmitri posted in his responses. If you don’t know what you are talking about then it is better to say nothing. I am not criticizing Manifold software, since I have never used it, I am criticizing their refusal to muzzle Dmitri and to act somewhat professional on their website, which is probably written by Dmitri as well.  Even Manifold users who would like to promote the software find him and the website to be a problem.

At first I thought it was kinda funny but then reading more on the web site and his comments I started to get annoyed.

I have met people who make stuff up about all the cool things they have done in their life - and when they keep going and going your “smack talk meter” ™ starts to go off.  Dmitri and the website make so many claims without any proof to back it up. And since I can’t get a demo version there is no way for me to see how much smack is flying around so it all becomes unbelievable.

Who knows if I will use Manifold, but if I do it will be in spite of Dmitri and the web site.
By the way Doug, you receive little to no points for staying anonymous in the debate with Dmitri. It is awfully easy to criticize sitting behind an anonymous moniker.

Firefox gets wind of our discussion

So I guess Paul and I mentioned Firefox one too many time to escape the notice of the Moz foundation. Asa (one of the founding devs on Firefox) responds to Paul’s post and set the record straight on the need for a firehose. I remember those discussions when they were happening at Moz ( I am a passionate Netscape -> Mozilla -> Firefox user ) and thinking it was a dumb idea to fork Moz for this smaller browser thing. Boy was I wrong and I think I have taken that message to heart for how OS projects really gain traction against an entrenched commercial closed source entity. Reading blogs like Guy Kawasaki’s and Creating Passionate Users only furthered reinforced those feelings.

Please remember the comments I am making are directed specifically to Paul’s comments and not to open source GIS/spatial libraries in general.

Some clarification

I would have left some of these comments on Paul’s blog but non-blogspot.com folks can’t leave comments.

I agree about the funding stream being important and different between firefox and geotools. But I would also argue that Linux was in a situation like geotools or postgis a while ago but it has been through growing their user base that they were able to get firehoses turned their way. It is easier to get people to contribute when they see how something benefits them directly. IBM contributed developers to Linux because their customers wanted to use Linux. I could see more money flowing uDig’s way if people felt like it did some more of what they wanted and was easier to use.

I don’t want uDig or GRASS to be ArcGIS or even expect it to that. I gave very specific functionality that I wanted so I could get “my” work done better. As a matter of fact if you made uDig like ArcGIS it would make me sad. I just want you to make things somewhat comfortable to me. Make me kick butt in as short a time as possible.

Most of the projects you point out are developer products but uDIG, GRASS, OpenJump, and QGIS are all targeted as desktop applications. PostGIS is kinda positioned as a data store for these desktop applications.
Since I have a developer background and I have done work with Excel in Java using POI, I have been trying to find the time to work on uDIG or openJump. I do see how I could contribute to the project.

I guess we fundamentally disagree over this statement (and by linkage association, Sean as well):

Open source is not about users, it is about developers. It is only about users in so far as users become sufficiently engaged in the project that they either become developers themselves, or support developers through careful bug finding or documentation.

I know there is debate in the OS community about this viewpoint, whether it was Moz versus Firefox, Linux, Linux desktops, Database admin GUIs. I guess for me it starts with users and a couple of developers. There are plenty of cases where starting from either perspective has killed a project, so I am not saying either perspective is “right”.

The one caveat to your statement is you can’t claim it is a developer product and then complain that people don’t use it. Look at how far that got ESRI with Server. Not that Paul has done that but there are other mumblings and such that go around.

My previous post was trying to help give some feedback about how to generate a critical mass of users. I may/probably will use GRASS, PostGIS, QGIS, and OpenJump or uDIG on my own time or for experimental purposes. And there you have it…

One final update after talking to James - my title for the original post was chosen on purpose. I am stuck in the middle - I really want to go to the OS side of the GIS world but given what I and my group need to do day in and day out I can’t. James points out people get what they pay for - and I would argue in this case it is way more than what they pay for - but it still doesn’t get me all the way there. Ok this might be a bad analogy but it is like a British car without the transmission for a person here in the US who needs to drive around now. I would love to buy it and I could get past the steering wheel on the wrong side, but I really need that trasmission in the car when I buy it. After that point then there is plenty of room for discussion. I hope the analogy didn’t make the whole thing worse so please don’t get too hung up on it…

The GIS user is stuck in the middle

Andrew H has a great post talking about why people should keep their geospatial data in a spatially enabled database (such as Oracle Spatial or PostGIS). And Dylan talks about the functionality of PostGIS in my comments. And I was intrigued by the ideas presented. Heck I was intrigued by uDIG as I was getting ready to leave ESRI. What fun to write in Java and work on geospatial technology and open source to boot. So I have dabbled some reading the doc, using some of the software, cruising the discussion forums and my conclusion is that there is serious work that needs to be done before this would be ready for a shop like ours. And by a shop like ours I mean one that has 5-6 GIS folks without a lot of programming time who need to be doing work that directly generates revenue.

Sometimes in this discussion I am going to talk about spatially enabled geodbs, but I am also going to talk about the usefulness of the OS geospatial stuff to me and some of the people I work with. So sometimes what I say will apply to Oracle spatial and other times it will be more about open source, please keep that in mind.

First on the spatially enabled RDBMs - you guys are never going to get anywhere in small shop land without a good front end. We do a lot of editing and map production - without a high powered front end you are of no use to me. I don’t want one data format for my editing and map production and then export for storage. The front end has to give me high quality map production so I can produce figures for my reports and it better be easy - ArcPLOT sucked so don’t try to tell me I should go back there. Good topologies need to be in there right out of the box - I am done with slivers and gaps. For serious GIS users these are two basic requirements. If you gave me this I would struggle through the command line and the other funkiness of GRASS for my habitat models.I know QGIS and uDIG are supposed to be that UI, but they are not and until they are I am done looking at PostGIS. As for Oracle spatial - I am not even sure what UI I should use. If Manifold wants to be that alternate UI they better start thinking about giving away free demos. $250 or whatever they charge for the full version is cheap but I am not going to spend that just to see if I want to use the software. Everybody is giving out free trials with a time bomb - c’mon, fish or cut bait. Their current model means I need manager approval to just try out the software - unlikely. But it must be fun to be the manifold folks where you can say things like:

Faster, Smarter and more Capable than ESRI ArcView, ArcGIS 8 or MapInfo Professional …
Don’t get stuck using antique, “toy” GIS packages when you need serious combinations of raster and vector layers.

And here is the bigger dilema I see for the OS folks…

They are fighting against people already being familiar with ArcGIS - which is very similary to Linux fighting against Windows. If you want people to switch you need to make the transition as painless as possible. Firefox got people to switch to IE by

  1. Making better software
  2. Not making user learn a new UI for interacting with the web
  3. importing all their IE favorites
  4. THEN building in cool new features that keep people around

GIS users spent A LOT of time learning how to do things the ESRI way - and whether that is the right or wrong way is immaterial. Time AND money is precious and it takes both to switch to new software, especially when it requires new ways of doing things.

Here are some areas for you to focus on to make people think about switching

  1. Make things simpler - go with a UI geared towards very specific tasks. So even if it’s not the way I am used to doing things, if it is much simpler then thats where I am going. If I had a GIS viewer application that put vectors on top of raster and also allowed me to intersect/buffer and calculate area I would have something that I could give to my non-GIS folks to do some of the more routine work we do. The UI should be like Word or Excell since that is what most people know. In this arena I think I can point to Google Earth as a good example…
  2. To go for the GIS pros make most of the usual tasks very similar to the way things are done in ArcGIS. You have 20-30 minutes of me dinking around with the software before I chuck it or keep it or at least suck me into playing around for 20 more minutes and so on and so on. Not everything has to follow ArcGIS convention but some of the more common tasks need to have a way to do things that makes sense to an ArcMap user. Again you need to, at the very least have good editing tools and good map production tools. One of the big things people were talking about from both this year and lasts years UC is the new cartographic representation because a lot of people make maps for a good chunk of their work week.
  3. For both of these points read the creating passionate users blog - one of the best on the net. Particularly good posts are Featuritis (which a rather well know GIS software vendor is know for) and Attenuation and the graph seems to be gone in here original post but google images found it here

Let me try to sum up:

  1. I like OS software in general - I used it all the time and have contributed to the doc and filed bugs for some of the projects, and even helped a tiny amount with an OS GIS project (look for TheSteveMonkey) (there is more but I am feeling lazy tonight). I would love to see a strong OS solution in the GIS space that helped me do what I need to do. I resent the fact the ESRI is the only software I have to use which prevents me from running a Linux or Mac machine.
  2. Most GIS users just want to get work done - they may even be sympathetic to the idea of OS software - but at the end of the day they want to go do other things and they want their paycheck. Don’t expect big contributions in either time or money unless you can sell them something worth more than what they already have. And don’t forget to include the cost of learning new tech and time spent migrating data.
  3. There is a huge institutional investment in ESRI technology and you don’t swtich people by making the switch hard. You need to give them easy ways to switch which still allow them to do what they need to do.
  4. Most of my statements apply to me and what I see as the “smaller” GIS shops. Some of the issues I talk about here may not be an issue for the larger GIS shops or those not in a cost-recovery/billable time arena.

I would love to hear other peoples thoughts on some of the ideas I have laid out here. For now, as much as I would like to go to a spatially enabled RDBMs, rather than an a object-relational spatial data engine, there are too many barriers in the way to make it practical for work. And for that reason I feel like a GIS user stuck in the middle. I want to use a storage technology that makes sense with the work I do (a spatially enabled RDBMs) with a front end that makes working with that technology easy and efficient (ArcMap).

Steve’s list of improvements to geodatabases

I know that ESRI appreciates customer feedback and so this post is written in that vein. I have spent some time now wrestling with trying to put all the data for project into a geodatabase, both my spatial and non-spatial data. In the end I decided this was not a good idea given the smaller size of the project, the requirements of the project, and the machinations I would have been forced to use.

I could have done bunch of what I wanted/expected with a class extension but the trade-off in using them was too much. It wasn’t the writing of the code that was the problem, since there is already a sample which does 50% of what I want and I could see where to put the rest. The real problem is that if the dll isn’t installed there is nothing, and I mean nothing, that ArcObjects can do with the dataset (I tried a bunch of different ways to look at the data and they all failed). And locking up a dataset like that freaks me out. I know the personal geodb is locked up but as long as someone goes and buys arcview they can look at the data. If I add the class extension, then if, for some strange and unforeseen reason, the dll goes missing, there is no way anyone else can look at the data.

Using application logic wasn’t satisfactory either because there was nothing to stop users from opening ArcMap and editing away. One reason I use a database is because I would like to enforce some rules about the data no matter how the data is accessed.

So here is an abbreviated list of some of the improvements I would like to see ESRI do for geodatabases.

1. Ability to generate a primary key. There is no built in geodb type that I can use to autogenerate a primary key. As I mentioned before you can not use ObjectID. If you look at most of the models that are shared on the ESRI site they use text keys, and usually because there is some key already generated for the item, such as a FIPS code. Depending on the RDBMS used, you can go in after the geodb is generated and assign an autogeneration column type by hand. The problem with personal geodbs is that Access only allows 1 autoincrementing column per table, which is taken by ObjectID.

2. Ability to make a unique index on one of my columns and still be able to edit the data. This would help with the item above, since I could have the users enter text or something else and then insure that it is unique.

3. Make up your mind as to whether or not we should use UML. If you look on this page you will notice that ESRI provides it’s models in UML and that this can be used as a tool but then later on the page they say that UML is not really recommended. If UML is not the right tool, which I partially agree with since making the UML models for geodbs is rather tedious, then what I should I use for modelling? For anything past a few tables and feature classes I want to use a graphical tool. I need that picture to help visualize the relationships and I need it to help communicate with other developers and users. You know it is not that big a deal to me that I can’t model a topology. What tool should I use?

Which I think leads to the crux of the problem - the way ESRI has implemented object relational is not working for me. They have basically taken the path of the grape - and so there are three options, none of which look awfully likely.

1) Go full on object - be up front about it and take the hit. State that the DB is just an object store and people shouldn’t mess around with it. Using this route it will be clear that the only way into the geodatastore is through arcobjects and we shouldn’t spend time or effort trying to go down any other route. This would probably cause a lot of customers, especially the bigger ones who have a large investment in RDBMs, to complain loudly. For now, the file-geodatabase is an answer to this request. There is no way into a filegeodb without arcobjects. I am also sure this is why there will eventually be odbc and other interfaces to the filegeodb, which will bring us back to the current state.

2) Go much closer to the relational route: stop storing domains as blobs, give me Primary keys, timestamps, let me design with ER diagrams… This has the benefit of giving people a technology that is familiar to them and they also have better tools to work with than geodatabase. This would be a lot of work on ESRI’s part as good chunks of the object model would have to be rewritten and there would be much less business logic in ArcObjects.

3) Do a better job of support both sides of the object relational equation. Build a diagramming tool that will help professionals model there geodb in a way that makes sense for Geodbs. Add into the default ArcObjects a class extension that makes primary keys and timestamp datatypes. It would also help if there was better organization of the documentation and samples (or even a better search). I haven’t thought of all the things I would need but it seems like the relational side needs the most help.

Since none of these options are going to happen in the near future I guess I will maintains seperate data stores and probably try to keep things out of the geodb unless I absolutely need it in there.

My final list for this post is:

1) I am hoping that some of my issues will be pointed out as stupid by those with more experience or knowledge of geodbs. I will probably argue back for a bit and then be thankful that someone showed me that it does not have to be as bad as my experience.

2) I am sure the geodb is working fine for quite a lot of people, so these are things which just made my work life ugly for the last week or so. I actually see how this could work for larger shops with larger IT staffs, or for smaller shops who have more programmers on staff.

3) I will try to write a follow up on why the full relational DB route would be no picnic now.

mental note to myself

There are geoprocessing tools for exporting and importing domains to tables and vice versa. Find them under Data management -> Domains. Don’t forget with GP coming into engine and server you can use these tools rather than writing the low level code to do the export. I assume they require editor (engine geodb) or arcinfo licensing. Another interesting tech article on domains that shows how to export a featureclass to a shapefile and and convert the numbers in a field with a domain to their text values. Handy dandy…

The good Dr Dave on the front of the NYTimes

One of my professors from Grad school, Dave Wagner is one of the videos on the front of the NyTimes (I linked to the version in the media section). He is a really fun guy to talk to and he has some great catepillars in his lab. I still remember the time my friend Manuel’s dog stole Dave’s hamburger by executing a flying leap and grabbing the burger and bun right from Dave’s hands. Dave said with an effort like that Argus deserved to keep the catch.

Why did nobody suggest this article early on

There is a white paper on using SQL to edit and work with geodatabases. It is not the white paper I was asking for but it does answer some of my earlier questions about using SQL directly to query and edit a geodb. It also covers working with versioned data. If you know there are things like this out there then just tell me how to search for it. I looked all over support and google searches but found it in the project center.

White paper for ESRI to write

I was explaing my troubles to Naomi today and we came up with a great idea for a white paper or tech article for ESRI to write:

Geodatabases for relational database users

There are many more relational database users out there and they could use some help transitioning to the geodb. There are constructs and patterns that they will be used to using that do not work in geodbs. There is plenty of material explaining concepts in the geodb and how to design a geodb but these assume that the person doesn’t have a mental model for working with RDBMs. If you want to appease these people and make them feel comfortable about putting their data in a geodb then do some hand holding. There needs to be a clear set of transitioning rules to help people make the transition. It would help keep a lot more hair on peoples heads…

OBJECTIDs Primary Keys, and Documents for geodb best practices

I am now on to making many to many relationship classes because a patch of Weeds can have many species and a species can belong to many patches. I have a little test UML diagram where I try out the concepts for a simple schema. I try to create a Many to Many between a table and a polygon featureclass. I can’t get the relationship class to understand if I specify the objectid as the primary key and the polygonid (integer) as the foreign key. If I change it to some arbitrary text field for both PK and FK it works fine.

Then I do some searches on support and find this gem of a thread, which basically states never use OBJECTID field as a PK because when you export the featureclass all the items may renumber themselves. WTF? Which may lead to a much larger philosophical problem. How do I autogenerate primary keys? Can I use autogenerated OIDS as long as they are not in the ObjectID field? Oh wait - you can not have more than one OID field in featureclass. So it looks like there is no way to get an autogenerated column in a geodb without writing code. This has to be wrong. Please for the love of all that is right in the world of data consistency let this be wrong!

Where is this written? Where would I even begin to look for this. It would be nice to know these things before you actually design a geodb and have it in production.

Next Page »