Data model design best practices
I need some recommendations from MPP gurus.
We are having a data model that is more or less 3-NF compliant and I feel that it needs
some optimization for fast reads as it will be used for analysis only. (i.e. elimination of separate lookup tables).
Is there an online web resource or a very good book that outlines best practices how to optimize the 3NF design for analysis on Greenplum like (MPP) system.
Thank you in advance.
unfortunately I have not seen a book that covers logical/physical modelling of distributed databases (data warehouses) based on shared nothing architectures such as Greenplum (I would not mind a recommendation too).
The good thing of Greenplum is that, by exploiting the power of MPP and data distribution, it allows you to start using a 3NF database for a data warehouse (something that it could not even be imagined on standard databases), rather than with a typical star-schema approach with flattened dimensions. Technically we could see the 3NF model as a snow-flake model using a data warehouse modelling terminology.
Nobody prevents you from adopting an agile approach to data warehousing (which is something we very often recommend, as the data warehouse is continuously changing and adapting itself). Greenplum allows you to use the data warehouse as a sandbox and try yourself by simply using SQL statements, temporary tables, different schemas, etc directly inside the database.
One important thing you should consider though is access to data and - in particular - to dimensional data. Greenplum (and shared nothing architectures) may suffer from what is known as "redistribution motion" during joins (I suggest you look for this term in the admin guide). This forces you to try and think about usage patterns when distributing facts and dimensions.
I do not know what backgrounds you have in data warehousing, but if you want I can definitely recommend some books on this topic. Let me know.
One book that I still consider interesting (but it is more for database physical modelling in general) is this: http://www.amazon.com/Physical-Datab...ews/0123693896 . Despite some bad reviews, I still found some parts of it quite useful.
Keep us posted,
Tags for this Thread