Database design

Topics: Developer Forum
Nov 6, 2006 at 1:23 PM
Looking at the source code, it seem that it uses a huge XML file for the library database. I figured that my library (600 Albums/9000 songs/30GB) would need a 1.5MB XML file. That's pretty big for an XML file. And piffie's library is twice as big, so it's would need an even bigger file.

The problem is that XML just isn't designed as a database. Nothing is indexed. All searching requires a full scan of the file.

I'd like to recommend that we move it to a full-fledged database. My suggestion would be to use the database Duncan Mackenzie's MusicXP project, which he has already released on GotDotNet. That's designed for SqlServerExpress which is a free download (and I think it's built into Vista). Mackenzie's code already has the directory scan/DB building complete.

Database scripts:

Rest of the source code:

On the other hand, piffie seems to have his own music database design, so I'd like to hear about that.
Nov 6, 2006 at 3:24 PM
My library is about 4500 songs. It parses this from scratch in 40 seconds (the XML can be optimised by loading all of the songs before categorising them into artist and album - this removes all the checks and lookups that are going on). File size is about 750k.

Performance on searches is pretty quick, often less than a second from handheld(search) --> server --> handheld (response)

I started with XML because it was fast and required no extra software to be installed; It also gave me inbuilt support for querying data.

One of the things I really want to do with this is make it accessible and easy to set up and use. Adding a database definitely increases complexity and adds another tier to be maintained during the upgrade process.

I guess the question is what are we going to do with the library? piffie had a number of things listed in his post, it also sounded like he'd made a start on some of this. piffie, what have you got in mind for the music library?

Nov 6, 2006 at 4:31 PM
Now, I assume that the 40 seconds is for loading a pre-built file.

- How long does it take to build the file is the first place?

- More importantly, how long does it take to update the file after changes have been made to the library?

As an XML file, I'd assume updating the file would require shutting down the server, recreating the file from scratch, and then restarting the server when it's down. As a database, the update process can run in the background, while the server is running, with both accessing the library without interferring with each otehr.
Nov 6, 2006 at 9:19 PM
Actually, the 40 seconds is the amount of time it takes to build a new music library xml file from scratch (first load). This is for 4500 tracks (open track, read mp3 tags, process and add to xml doc). This can be optimised (it currently checks to see if an album and artist already exist on a file by file basis - this can be recoded to be more efficient).

Loading a pre-built file is very quick (sub-second).

Updates are added into the server using a filewatcher. When it detects a file has been added it adds the new node to the document.

A complete rebuild occurs on a different thread so it doesn't affect the music server. When the rebuild is complete it simply replaces the old doc with the new one. There is no music server downtime.
Nov 6, 2006 at 9:44 PM
Damn, that's fast. I would guessed that just scanning the directory tree (without open/reading the files or building the xml) would have taking longer than that. Do you have your music files in one big folder or in a \{artist}\{album}\ hierarchy?

I'l have to try actually building the XML file on my system. IF this works out, it seems that the XML file may be workable. THere will eventually be a scalabilty problem, but it would seem that would require a truly massive music library to hit.
Nov 6, 2006 at 10:00 PM
Nope, I've got it in a Artist/Album folder hierarchy so it's jumping between folders. It does a recursive scan of the music library root directory and parses every file in the tree that has an mp3 extension.

The good news is that these are .Net 1.1 timings and the .Net 2.0 XML processor is twice as fast as the .Net 1.1. processor.

Still, worth seeing what piffie has done as it sounds like it's much more advanced than than the xml-based approach I used.
Nov 8, 2006 at 10:25 AM
my library have the object structure serialized (binary). have not the right timings in my head, but the complete library took as a file about 30MB. thats because there are also some id3 data in it. and thats also why the scanning takes long.
thats really depending on the harddisc. since to read ID3, it has to open the file. if i have the file in the HD cache (on hybrid discs) or on an usb stick, i get about 100 files per second. if its an slow, fragmented disc it gets about 20 files per second.

scanning folders does not matter. in ntfs the filesystem lookup for files under an specific directory is pretty fast.

the loading and saving of the 30MB file are just 15-20 seconds. i'll try to make an demo client of my lib and post it into the tfs, so you can try it out.

and also, i have another serializer i did'nt try out, that should be faster than this. maybe i can implement it also.
Nov 9, 2006 at 8:15 AM
How do you query the library for artists/tracks/albums/tags/etc?

I'd be really interested to hear how that works.
Nov 9, 2006 at 6:50 PM
its all done via .net generics.

i'll upload an demo code. but dont really want to share the real source since its not good done. you can look into anyway with the reflector.
i'll get some time next week to look into the code and how to combine it easily.
i added also my library cache there. just delete MediaLib.dat file and you can scan an new one.

also this code is older and untouched. did'nt wrote it failsave :-)

Nov 9, 2006 at 6:53 PM
ähm. maybe some more info.
the download is 2,6 megs big.
after the run of the pifplay.exe just type in what you want to search for in the textbox. in the menu you can add another folder to the library.
i also have some extended controls for another project that has automatic update implemented on file changes.