index Back to help overviewhome Back to home page

Database archive

Scidb's database archive is based on an independent archive format with the general file suffix ".ive". Scidb will use the special file suffix ".scv". This archive format contains information about the included file like name, size, compression method, etc.

This archive format has the following syntax (simplified EBNF):

Archive     = Magic [ ArchInfo ] [ UserInfo ] { File } ;
Magic       = 'iveArch' LF ;
File        = Header BinaryData ;
Header      = HeadDelim LF HeaderInfo DataDelim ;
HeadDelim   = LF '<-- H E A D -->' ;
DataDelim   = '<-- D A T A -->' LF ;
HeaderInfo  =
  '<FileName>' { Space } FileName LF
  [ '<URI>' { Space } Number LF ]
  [ '<FileSize>' { Space } Number LF ]
  [ '<Size>' { Space } Number LF ]
  [ '<Compression>' { Space } Compression LF ]
  [ '<Checksum>' { Space } Crc32Number LF ]
  [ '<Modified>' { Space } Timestamp LF ]
  [ '<Encoding>' { Space } Encoding LF ] ;
Space       = HT | " " ;
Compression = 'raw' | 'zlib' | 'lzo' ;
BinaryData  = { Byte } ;
ArchInfo    = '<TotalSize>' { Space } Number ;
UserInfo    = LF { Attribute LF } Attribute ;
Attribute   = AttrName { Space } AttrValue ;
AttrName    = "<" Identifier ">" ;
Identifier  = AsciiLetter { AsciiLetter } ;
AttrValue   = AsciiGraphical { AsciiGraphical } ;
Timestamp   = Date " " Time ;
Date        = Year "-" Month "-" Day ;
Time        = Hour ":" Minute ":" Second ;

<FileName> is any platform independent, and valid file name, without a preceding path. A platform independent file name cannot contain the characters "<" and ">"

<URI> is an unified resource identifier, for example, or file:///home/chris/Chess/MyBase.sci. It is the decision of the decoder in which way the preceding path specification will be used.

<Size> is the size of the file contained inside the archive,

<FileSize> is the size of the unpacked file. This internal size information is simultaneously the offset to next file header.

<TotalSize> contains the sum of all unpacked files. This attribute may not be available if the sum is unknown.

<Checksum> is a check-sum which concerns the file data inside the archive, and will be computed with the CRC32 method.

<Modified> contains the last modification timestamp (GMT), the decoder may use this information to restore this timestamp in the extracted file.

<Encoding> is specifying the character set.

The attributes <FileName>, <URI>, <FileSize>, <Size>, <Compression>, <Checksum>, and <Modified> may be enumerated in random order. The default for the attribute <Compression> is 'raw' (which means no decompression is required).

if URI is an external sink (for example http://*), Size may not be available, and FileSize may be unknown as well (not available). In this case usually the data section is empty (empty line). But file data may be available, which should be used if the external sink is unavailable. Then the attribute Size has to be specified anyway.

Magic is chosen in a way that the following points will be ensured:

  • Readableness with a text editor.
  • No confusion with an ordinary text file.
annotation The whole archive format is designed in a way so that it is readable and editable with any suitable text editor (which allows binary data).

In ".scv" format the attribute UserInfo has the following definition:

UserInfo    = { LF Attributes } ;
Attributes  =
  "<Format>" { Space } Formats
  "<Type>" { Space } Type
  [ "<Count>" { Space } Number ] ;
Type        = 'single' | 'multi' ;
Formats     = Format { "," Format } ;
Format      = 'sci' | 'si3' | 'si4' |
              'cbh' | 'cbf' | 'pgn' |
              'pgn.gz | 'bpgn' | 'bpgn.gz' ;

<Count> is the number of games in the database. If this information is unknown (for example in case of PGN files), then this attribute is not available.

<Format> describes the format of the extracted database.

In general archives do contain only one database (<Type> = 'single'). But it is also provided that several database may be packed in one archive. In this case <Type> should have the value 'multi', and <Format> will contain a comma separated list of all comprised database formats. Moreover <Count> is the sum of all games in all databases.

The order of the attributes <Count>, <Format>, and <Type> is arbitrary.

At least two simple examples for archives:

<TotalSize> 468
<Count> 1
<Format> pgn
<Type> single
<-- H E A D -->
<Name> one-game.pgn
<Size> 468
<Compression> raw
<Checksum> 3225351655
<Modified> 2012-02-21 18:31:12
<-- D A T A -->
[Event "London knockout"]
[Site "London"]
[Date "1851.05.27"]
[Round "1.1"]
[White "Staunton, Howard"]
[Black "Brodie, Alfred"]
[Result "1-0"]
[EventDate "1851.05.27"]
[EventCountry "ENG"]
[EventType "k.o."]
[EventRounds "3"]
[ECO "C44"]
[TimeMode "normal"]
[Source "Tournaments"]
[SourceDate "2004.03.11"]

1.e4 e5 2.Nf3 Nc6 3.d4 exd4 4.Bc4 Bb4+ 5.c3 dxc3 6.O-O
Qf6 7.e5 Qe7 8.a3 cxb2 9.Bxb2 Bc5 10.Nc3 d6 11.Nd5 Qd8
12.exd6 Bxd6 13.Bxg7 Bg4 14.Re1+ Nge7 15.Nf6# 1-0
<Format> pgn
<Type> multi
<-- H E A D -->
<FileName> tiny-1.pgn
<-- D A T A -->

<-- H E A D -->
<FileName> tiny-2.pgn
<-- D A T A -->