Saturday, February 25, 2012

BLOBS in a sql server database - yes or no?

Hi,
we have a sql server 2000 database running that manages amout 10 million
invoices (growing at 10000/day). in addition, we have a file server that
holds all invoices as pdf files. the problem is that it is extremely
difficult to backup these files or to do other file-realted task since there
are so many of them.
thats why we started thinking abut moving all these files directly into the
database as binary objects (BLOBs) so that invoice meta data and content is
at the same place. the average file size is about 40k. im not a big fan of
blobs but in this case it seems to be a good thing since backup and file
management wouldnt be a problem anymore (among other advantages).
my question is if you already experienced having large amounts of blobs in
the database and if you think its a good idea or not. i don't want our
overall query performance to be affected in any way (i know that sql server
stores pointer to the blob data in the tables so it shouldnt have a great
impact on performance, should it?). i'm interested in your experiences and
recommendations.
btw.: we would upgrade the hardware to something like this:
2 x Xeon 64, 3.6 GHz
8 GB RAM
10-14 disk drives for a total of 1-2 TB
i think it should be enough.
thanks in advance,
Benjamin Janecke
The knock on blobs in the database is really that they are inefficient for
the application that is going to process them. You can't directly stream
from the database out over a network file/web protocol as you can from the
filesystem, and most applications read from the file system so that you have
to first copy the blob into a file in order to process it. The benefit of
storing what are essentially files in blobs is, as you describe below, the
management benefits. From syncronized backup to avoiding broken links. We,
for example, put the T&Cs for our promotion codes in the database. If you
want a description of the promotion then you click on a hyperlink and we pop
up a page populated from the database. This isn't a super high volume
occurence so we aren't worried about the extra overhead involved. What we
gain is we don't have to worry about a link to a file describing the promo
code getting broken and the user getting either no description or the wrong
description.
As you already know the blob (unless it is very small) isn't stored in the
row and thus there is little or no performance impact on normal queries from
having the blob in the database.
The preeminent example of storing blobs in the database is the Terraserver
(http://terraserver.microsoft.com) which stores multiple Terabytes of
imagery in blobs and serves it out over the internet. This has been running
successfully since 1998, at times handling a few million hits per day. I
spent a few sleepless nights helping deal with problems the week it went
live (and took a lot more hits then we'd expected), and none of those were
related to the use of blobs.
Personally, my criteria for storing your invoices in the database would
involve two decisions. One is how long you wish to retain them (short, then
don't bother storing them in the database) and the other is how frequently
they are accessed. Since keeping invoices in this manner would typically be
for archival purposes and they would almost never be opened, you really
don't care about the performance hit versus keeping them in the file system
when one needs to be displayed. So the management benefits would outweigh
the performance negatives.
Hal Berenson, President
PredictableIT, LLC
http://www.predictableit.com
"Benjamin Janecke" <Benjamin Janecke@.discussions.microsoft.com> wrote in
message news:A5EF2AD5-B2F1-48BD-A9A7-41D42D63301F@.microsoft.com...
> Hi,
> we have a sql server 2000 database running that manages amout 10 million
> invoices (growing at 10000/day). in addition, we have a file server that
> holds all invoices as pdf files. the problem is that it is extremely
> difficult to backup these files or to do other file-realted task since
> there
> are so many of them.
> thats why we started thinking abut moving all these files directly into
> the
> database as binary objects (BLOBs) so that invoice meta data and content
> is
> at the same place. the average file size is about 40k. im not a big fan of
> blobs but in this case it seems to be a good thing since backup and file
> management wouldnt be a problem anymore (among other advantages).
> my question is if you already experienced having large amounts of blobs in
> the database and if you think its a good idea or not. i don't want our
> overall query performance to be affected in any way (i know that sql
> server
> stores pointer to the blob data in the tables so it shouldnt have a great
> impact on performance, should it?). i'm interested in your experiences and
> recommendations.
> btw.: we would upgrade the hardware to something like this:
> 2 x Xeon 64, 3.6 GHz
> 8 GB RAM
> 10-14 disk drives for a total of 1-2 TB
> i think it should be enough.
> thanks in advance,
> Benjamin Janecke
|||thanks for your very informative post.
we store the invoices for both archive purposes and daily access. the
database is part of an application that enables our customers to view their
invoices online. the invoices can also be accessed by our customer care
employees. in addition, we provide functionality to send invoices via e-mail
etc. in total, i think we have more than 1000 requests a day for pdf files,
but much less than 10000, so its not a big deal.
the good news is that the application doesn't depend on accessing the
invoices as files cause we only read them to deliver them as binary data to
the webbrowser (using special content type). therefore it wouldn't be a
problem to get binary data back from the database instead of opening physical
files. for e-mail-sending it is also possible to include binary data as
attachment instead of providing attachment files. thus, i don't think there
will be a big negative performance impact on the application except that the
data has to go through the network.
So, we will propably give it a try...
best regards,
Benjamin Janecke
"Hal Berenson" wrote:

> The knock on blobs in the database is really that they are inefficient for
> the application that is going to process them. You can't directly stream
> from the database out over a network file/web protocol as you can from the
> filesystem, and most applications read from the file system so that you have
> to first copy the blob into a file in order to process it. The benefit of
> storing what are essentially files in blobs is, as you describe below, the
> management benefits. From syncronized backup to avoiding broken links. We,
> for example, put the T&Cs for our promotion codes in the database. If you
> want a description of the promotion then you click on a hyperlink and we pop
> up a page populated from the database. This isn't a super high volume
> occurence so we aren't worried about the extra overhead involved. What we
> gain is we don't have to worry about a link to a file describing the promo
> code getting broken and the user getting either no description or the wrong
> description.
> As you already know the blob (unless it is very small) isn't stored in the
> row and thus there is little or no performance impact on normal queries from
> having the blob in the database.
> The preeminent example of storing blobs in the database is the Terraserver
> (http://terraserver.microsoft.com) which stores multiple Terabytes of
> imagery in blobs and serves it out over the internet. This has been running
> successfully since 1998, at times handling a few million hits per day. I
> spent a few sleepless nights helping deal with problems the week it went
> live (and took a lot more hits then we'd expected), and none of those were
> related to the use of blobs.
> Personally, my criteria for storing your invoices in the database would
> involve two decisions. One is how long you wish to retain them (short, then
> don't bother storing them in the database) and the other is how frequently
> they are accessed. Since keeping invoices in this manner would typically be
> for archival purposes and they would almost never be opened, you really
> don't care about the performance hit versus keeping them in the file system
> when one needs to be displayed. So the management benefits would outweigh
> the performance negatives.
> --
> Hal Berenson, President
> PredictableIT, LLC
> http://www.predictableit.com
>
> "Benjamin Janecke" <Benjamin Janecke@.discussions.microsoft.com> wrote in
> message news:A5EF2AD5-B2F1-48BD-A9A7-41D42D63301F@.microsoft.com...
>
>

No comments:

Post a Comment