On Fri, 9 May 2008, Srini wrote:
> Which is more efficient? Why?? Storing or Reading a file in database or
> on to disk in a web server environment? considering all things like
> network, connections, memory are ideal.
>
> My knowledge says that using database is more efficient but few my
> pierce argue that storing in disk is more efficient when concurrent
> people accessing it...
Databases typically store their data in a file [1]. That means that a
file-based solution can always be at least as fast as using a database,
because it can just do what the database does.
The problem is that to make a file-based solution that's as fast as a good
database and also provides things like transactionality, you may have to
write code that's as complex as a database. Which is not good.
If you're mostly reading your data, so you don't have to worry about
concurrency and transactionality, and you have a straightforward
organisation (like having fixed-size records which you can refer to by
index in a sequence), then you can write a simple file-based
implementation that should be faster than a database, because it avoids
the overhead and complexity.
There's nothing in the java libraries, that i'm aware of, for doing this
kind of non-database structured file access. There are things for some
texual formats, like XML and properties files (remember those?!), but
nothing like DBM or COBOL's record-oriented files. There are third-party
libraries, though - see Berkeley DB, Java Edition:
http://www.oracle.com/technology/products/berkeley-db/je/index.html
and JDBM:
http://jdbm.sourceforge.net/
It's also not that hard to write your own fixed-size record manager, and
not that hard to layer variable-sized records on top of such a thing.
There's also an excellent trick for using the unix filesystem as a
database by storing data in symbolic links: the path of a symlink is
actually an arbitrary text string, so you can store information, rather
than an actual path, in it. Gives you hierarchically organised,
string-keyed records of up to a kilobyte (YMMV) without any actual file
IO!
A performance question for the wise: i hacked up a little fixed-size
record manager, and wrote two backends, one using RandomAccessFile, and
one using a NIO MappedByteBuffer. For both, i provided a way to flush to
disk after each write - with RandomAccessFile, via getFD().sync(), and
with MappedByteBuffer with force(). Timings to do a batch of reads and
writes (100 000 operations, 75% reads, on 10 000 records of 256 bytes
each; a different random pattern each time, on a machine doing nothing but
this and playing MP3s):
Implementation Flush? Time (ms)
RandomAccessFile no 733
RandomAccessFile yes 20659
MappedByteBuffer no 63
MappedByteBuffer yes 33087
The mapped file is an order of magnitude faster without flu****ng, but 50%
slower with. Any idea why?
tom
[1] Okay, so seriously heavyweight ones use disk extents/partitions and
bypass the filesystem; how much of a difference does that make?
--
.... but when you spin it it looks like a dancing foetus!


|