jason (61)
michael (35)
andy (29)
sharon (29)
daniel (27)
leslie (22)
tulipgirl (22)
brian (13)
ben (12)
foomonkey (10)
aj (6)
joe lance (6)
pudding (6)
jeff (5)
agirloutthere (4)
beck (4)
beth (4)
c.davis (4)
davidm. (4)
joe (4)
phil w (4)
rebecca (4)
aaron (3)
andy duncan (3)
hugo (3)
jeremy clifton (3)
joedumas (3)
joshua (3)
kevin (3)
October 2008
September 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
Chatter
Chickamauga Lock Project
Computer
Ed Bryant for Senate
Environment
Family
Fox Den Lane
GulfShore Baptist Assembly
Homebrew
Hurricane Katrina
Movies
Music of New Orleans
National Lampoon's European Vacation
Now Discover Your Strengths
Opinion
Politics
quotes
Recipes
Religion
Search
Signs
Things to do in Chattanooga
Torchbearers
Work
August 10, 2006
Horizontal vs. Vertical Database Design
I was at one point tasked with developing the argument against going with a vertical database design for a particular project. In my research one of the main reasons for going vertical was the null saturation percent of a database, which was caused by the number of columns added to support a very large eCommerce database. As some research projects go I never got a chance to finish, and because I have a feeling that one day in the future I will have to revisit this subject I am archiving what I had done out here on my blog.
The one neat thing that did come out of my research was this query that I wrote to get the null saturation of a MS SQL Server database. If you are curious about the null saturation of a SQL Server database then feel free to play with this query. Depending on the size of the database it could take up to 5 minutes or longer to run. I'm sure the query could have been written better, but it was written for research not as a production piece of code. Also, this query can not be run under the master database.
Download Query (the query in this file will make more sense because it still has some formatting to it)--Written By David Gidcumb
create
TABLE #NumColumnsRowsPerTable(
ID
INT
, Name varchar(100)
, Columns INT
, Rows bigINT
, TotalCellCount bigint
, NonNullCellsPerTable bigINT
, NullsPerTable bigint
)
create
TABLE #Columns(
Name varchar(100)
, NotNullCount bigint
)
--Number of Columns per table
INSERT
INTO #NumColumnsRowsPerTable
select
so.ID
, so.Name
, count(*)
, 0
, 0
, -1
, 0
from sysobjects
so
left join
syscolumns sc
on so.ID
= sc.ID
where so.type = 'U'
and sc.xtype
not in(35,36,34,99) --none
nullable types
group by so.name,
so.ID
Declare
@tablename varchar(100)
Declare
@columnname varchar(100)
Declare
@SelectCMD varchar(2048)
Declare
@SelectCMD2 varchar(2048)
Declare
@SelectCMD3 varchar(2048)
-- Declare the Cursor for getting Row Count
DECLARE
GetRowCount_table_names_cursor CURSOR FOR
SELECT
Name
FROM #NumColumnsRowsPerTable
OPEN
GetRowCount_table_names_cursor
FETCH
NEXT FROM GetRowCount_table_names_cursor
INTO
@tablename
WHILE
@@FETCH_STATUS = 0
BEGIN
set
@SelectCMD = 'Update #NumColumnsRowsPerTable set rows = (select
count(*) from xxxTABLExxx), TotalCellCount =(columns * (select count(*) from
xxxTABLExxx)) where name = ''xxxTABLExxx'''
select
@SelectCMD2 = REPLACE ( @SelectCMD , 'xxxTABLExxx'
, @tablename )
execute
(@SelectCMD2)
INSERT INTO #Columns
select sc.Name
, 0
from sysobjects
so
left join
syscolumns sc
on so.ID = sc.ID
where so.type
= 'U'
and so.Name =
@tablename
and sc.xtype not in(35,36,34,99)
Declare @NotNullCount
int
-- Declare the
Cursor for getting Null Count
DECLARE GetNotNullCount_cursor CURSOR FOR
SELECT Name
FROM #Columns
OPEN GetNotNullCount_cursor
FETCH NEXT FROM GetNotNullCount_cursor
INTO @columnname
WHILE @@FETCH_STATUS
= 0
BEGIN
set @SelectCMD
= 'update #Columns set NotNullCount =(select
count(xxxCOLUMNxxx) from xxxTABLExxx where xxxCOLUMNxxx is not null) where name
= ''xxxCOLUMNxxx'''
select @SelectCMD2
= REPLACE ( @SelectCMD , 'xxxTABLExxx' ,
@tablename )
select @SelectCMD3
= REPLACE ( @SelectCMD2 , 'xxxCOLUMNxxx' ,
@columnname )
print @SelectCMD3
execute (@SelectCMD3)
--select
@SelectCMD3
-- Get the
next table name
FETCH NEXT FROM GetNotNullCount_cursor
INTO @columnname
End
CLOSE GetNotNullCount_cursor
DEALLOCATE GetNotNullCount_cursor
Update #NumColumnsRowsPerTable set NonNullCellsPerTable
= (select sum(NotNullCount) from #columns), NullsPerTable =(TotalCellCount-(select sum(NotNullCount) from
#columns)) where name = @tablename
delete from #Columns
-- Get the next table name
FETCH NEXT FROM GetRowCount_table_names_cursor
INTO @tablename
END
CLOSE
GetRowCount_table_names_cursor
DEALLOCATE
GetRowCount_table_names_cursor
GO
select
* from #NumColumnsRowsPerTable
select
cast(((sum(NullsPerTable)*100)/sum(TotalCellCount)) as
varchar(100)) + '% Null Saturation' from #NumColumnsRowsPerTable
drop
table #NumColumnsRowsPerTable
drop table #Columns
Please feel free to ignore everything below the fold. It was a work in progress that is woefully incomplete.
HistoryIn the past horizontal databases where used exclusively in database design. But, as e-commerce gained in popularity the shortcomings of the Horizontal design began to become apparent. At first the solution was to just add new columns to support the unique attribute of each new product being added. As new columns were added problems unique to a Horizontal design were encountered.
They were:
1. Max column count exceeded (DB2\Oracle 1012 SQL Server 1024)
2. Null saturation of a table would start to exceed 90%
3. Constant Scheme changes due to adding columns
4. Performance issues are encountered when a table is very wide and only a few columns are returned in the query
Vertical Database Design
The solution to the e-commerce predicament was to change the design by taking the attributes (column names) in the horizontal design and making them keys in the vertical design. This change solved all the issues, which occurred under a horizontal design. Though there are problems that resulted which are unique to a vertical database design.
1. Queries written against the vertical design became cumbersome and error-prone.
? Multiple joins to pull back each attribute
? Maximum of 256 tables for each select statement(SQL Server) run query contained in [256 Table proof for Horizontal design.xls] as example
? See file query.sql as example of what needs to happen to pull back data.
2. Current Application development tools are designed for storing data object in a horizontal format. Data displayed in a horizontal format in a ADO.Net datagrid will make no sense to a client. Data would have to be converted from vertical to horizontal to make sense. Again see bullet 1.
3. Logical horizontal views must be written on top of vertical representation
? SQL Server view performance is poor when compared to teradata, and views execution plan must be recompiled when accessed.
4. Data is not strong typed do to the nature of the vertical design, which causes casts or converts to be needed for most attributes (columns) being returned. The biggest issue is that for a large enterprise it removes the natural typing of data at the database level, and instead moves to responsibility to each developer who inserts and updates the data-store.
The decision on weather to use a Vertical or Horizontal data design really depends on the scope of a project. The decision really needs to be made not only by the Data Analyst, but also the developers who are going to be interacting with the data store on a daily bases. Chances are that the Data Analyst left to their own doing will produce a vertical design. It is much easier to design because if a new item needs to be added all that has to be done is a simple insert into a table. But just because it is simple to design it does not mean it is the most appropriate design pattern to apply.
Reason's to Use a Vertical Database design
? Due to the constant adding of products in an e-commerce system i.e. Amazon, eBay and the like, new products with unique attributes are added by adding new columns, which describe the unique attributes of each new product a vertical architecture simplifies the issue for the data analyst. Issues that Vertical Design negate
1. Column count can exceed that 1024 for Microsoft SQL Server
2. Inserts and Stored procedure parameters can contain a maximum of 1024 columns
Reasons to Use a Horizontal Database design .
.
.
.
Views 256 tables is the max in a select statement
Posted by gid at August 10, 2006 06:53 PM
Great. This was exactly what I was looking for. Thanks for putting this together.
Posted by: kettsar at June 17, 2008 05:14 PMI'm having a big2 performance trouble with Vertical Database Design used for product management.
Here is the background:
Product 1--* Variation 1--* Product_Information 1--* Variation (for upsell etc)
As you can see from that simple relations to retrieve a product information for one product is a nightmare.
Plus, there is a database logic to optimise the storage so for example (if a product variation doesn't have the information which is stored vertically in the product_information it will use the default value from a parent product_variation)
Any suggestion of how to optimize it from Front End or SQL Select?
Personally In performance and storage the DB is small but when you perform a query with complex logic it required a long SQL select and for the front end it required multiple call to database.
Same pain when you perform a DBEntry programming (is a nightmare)
I tried to create simple join and output an XML from SQL. I think it will reduce the SQL Server CPU consumption but, will it kill the WebServer (if it is transformed programmatically), and i'm sure it will kill the user if I spit up the XML or make it a phisical product per product XML file accesible to public.
Would (XSLT + XML) from a physical file better than Database Call?
I wonder how Amazon done it with millions of products.
Posted by: Edwin at November 11, 2008 12:06 AMEdwin,
XML in theory should never be faster than a DB call. I know I am working with a project now that uses a vertical data model extensively, and the only time there is a performance issue is when we are trying to pull back 100s of items at one time. What data base are you using? Are you using indexes where appropriate? Have you run statistics on the db?
Posted by: gid at November 11, 2008 07:15 AM