John Kaster

Behind the Screen

How to repair a corrupted Blackfish SQL database

We use Blackfish SQL as the database for our discussion forums. Although the java version (formerly known as JDataStore) supports mirroring and fail-over, the .NET version has never been certified for that functionality. Sometimes when we have hardware failures in our data center, our database ends up being corrupted. Fortunately, I’ve found an easy way to repair this corruption that, so far, has worked all 3 times I’ve had to use it.

There is a very nice, free (with an option to upgrade) hex editor available called “Hex Editor Neo” that I used for the repair job, after discussing the issue with Adrian Andrei. After analyzing the raw file in Neo, it looked like I could copy the first block, from hex 0 to hex 2040, of a good version of the database over the corrupted version and repair the live database. So this is what I did:

1. Select the range 0x0 to 0x2040 from the good backup of the database using Neo, or some other hex file editor of your choice.

bsqlfixrange

2. Edit|Copy it. You should see something like this in Neo:

bsqlfix1

3. Select the tab with the corrupted version of the database.

4. Edit|Paste at 0x00 in the corrupted file. You should see the block turn red where it was pasted.

5. Then click Save.

6. Close Neo, and start using your database again.

Caveat Emptor

I’m documenting this because it works for me and it’s a quick way to fix the corrupted database. This is a completely unsupported tip. Your mileage may vary. I strongly recommend regularly backing up whatever database you use, and ensuring your network infrastructure has good redundancy so you never encounter this situation (we’re still working on improving ours).

Advertisements

Written by John Kaster

September 9, 2009 at 1:37 pm

10 Responses

Subscribe to comments with RSS.

  1. OK this probably sounds like a cheap shot, but it is totally sincere question: Why don’t you guys put your data in a real ACID database that has proven itself in the real world?

    Jan Derk

    Jan Derk

    September 9, 2009 at 4:03 pm

  2. Jan’s right – seriously, why are continuing to use a database with a proven failure rate?!

    Another good question is: Since this is (AIUI) an Embarcadero product, why hasn’t it been updated to better support this not-that-unusual scenario?

    M J Marshall

    September 9, 2009 at 9:37 pm

  3. Jan, M J — I have yet to find a database that can tolerate network failures without getting corrupted sometimes. The issue isn’t the database. The issue is the networking infrastructure it’s running on.

    I’d be very interested in a database that guarantees zero corruption with faulty network connections.

    We use Blackfish SQL for the same reason we use InterBase — because it works, and our use of it helps improve the product for our customers.

    John Kaster

    September 9, 2009 at 9:41 pm

  4. All I can say is yikes.

    I have to admit, this does not paint a pretty picture for Blackfish at all.

    Xepol

    September 9, 2009 at 9:46 pm

  5. Well, I have to say I expected these kinds of responses, but I thought that at least people would get the point of the blog post: how easy it is to repair a corrupted Blackfish SQL database, because corruptions of ANY database are inevitable.

    We have had corrupted databases with MySQL, MSSQL and with Oracle as well, some of which resulted in much lost data. In those cases, the only option was to restore the last good backup. I much prefer being able to update the first block of the Blackfish SQL database and be back up and running with no data loss.

    Corruption happens when hardware fails. I have yet to find a heavily used database that is 100% hardware fault tolerant.

    John Kaster

    September 9, 2009 at 9:51 pm

  6. This raises the question whether Blackfish is still in active development? Is the .NET version ever going to see mirroring and fail-over? To me it seems that Steve Shaughnessy is not longer with the company, so who is actively working on the product?

    Olaf

    September 9, 2009 at 10:45 pm

  7. John, any decent database handles network failures just fine. That is what Atomicity in ACID stands for. Even a free desktop database like SQLite is ACID compliant these days.

    You guys have Interbase. That certainly looks like a much better solution.

    Jan Derk

    September 10, 2009 at 10:46 am

  8. I’m sorry Jan, but my experience contradicts your statement “any decent database handles network failures just fine.” We have had corruption problems on every database platform we use in the 10+ years we’ve been doing this.

    John Kaster

    September 10, 2009 at 10:48 am

  9. I agree with John; Failures happen. I also agree with everybody who says, “Hey, why doesn’t blackfish have failover and redundancy”. Because every time you could have saved yourself the trouble of recovering from a recent snapshot, is good. Also, it would be good if recovery was semi-automated. Get list of good snapshots via some tool, and select it, and voila. Why should a hex editor be used?Seriously. It looks ugly.

    Warren Postma

    September 10, 2009 at 12:30 pm

  10. Of course, corruptions can happen to the most robust ACID databases too (a meteor may hit the harddisk at any time). Corruptions should never happen due to network failures. If it does happen, then you either ran into a database SW bug (highly unlikely if you use a popular DB and are conservative to upgrade) or someone did something really stupid, like disabling the logs.

    On the other hand if your database is not ACID compliant corruption due to network failures are quite common.

    The point I want to make is that it is quite naive *not* to use a ACID compliant database in 2009.

    Jan Derk

    September 10, 2009 at 1:20 pm


Comments are closed.

%d bloggers like this: