I'm making backup copies of old video games with CloneCD 5.3.3.0 on my Windows 10 x64 computer with a Samsung SH-S223L drive.

One of them is Hellfire for PC (Diablo 1 expansion):

  • The disc has a COMPACT disc DATA STORAGE logo
  • Serial number: S0011770
  • Factory SID-Code: IFPI 1218
  • CD-Master SID-Code: IFPI L032
  • ISO 9660 PVD creation date: 1997-11-18 16:30:00.00

I use the redump.org CloneCD profile recommendation:

[CloneCD ReadPrefs]
ReadSubData=1
RegenerateData=0
ReadSubAudio=1
AbortOnReadError=0
FastErrorSkip=0
ReadSpeedData=8
ReadSpeedAudio=8
IntelligentBadSectorScan=1
SectorSkip=1
NoErrorReport=0
FirstSessionOnly=0
AudioQuality=3

As far as I know the game has no protection but when I dump the disc twice I end up with different subchannel files (.sub). The .ccd and .img files are identical, only the .sub differs, I used SHA1 checksums and a hex editor to verify this.
I uploaded two .sub file dumps here.
I have to mention that I own two copies of this disc and the behavior is identical with both discs.

I also dumped several other CD-ROM media, sometimes I get this behaviour sometimes the subchannel is consistent across dumps.

What is the explanation of this behaviour?


Edit:

I dumped the same CD-ROM again with a Lite-On iH124-14 drive and I see the same behaviour (different .sub files).
I also checked the medium for errors with KProbe 2 and I get the following result:

KProbe 2 BLER scan


Edit:

It seems that the disc condition and/or the lack of precision of the drive added to the fact that subchannel does not have error control mechanism (except Q channel) explains why I get different .sub files when dumping the same medium multiple times.

I have to mention I also got a Plextor PX-712A drive and managed to get consistent .sub files across dumps by using Disc Image Creator. This software leverages 0xD8 instructions instead of 0xBE instructions to read the disc, resulting in more accurate images. Only a few drives (mostly Plextor) support this instruction.

Also I actually own two physical copies of this CD-ROM I'm dumping (same serial number, same IFPI codes and same laser engraved information). If I dump the same disc multiple times with Disc Image Creator I get consistent .sub files but not if I dump the first disc and then the second disc.
I guess it's related to the media conditions since one of them has a few scratches and more C1/C2 errors.

  • 1
    read errors (dirt, scratches, not necessarily actual errors from the drive) can cause CDROM images to differ. differences might be as little as a few bits; 1 bit difference is enough for SHA*/MD5 checksums to differ. – quixotic Mar 18 '17 at 18:08
up vote 12 down vote accepted
+100

The various CD formats are a bit involved, and the official specifications ("red book" for audio CD, "yellow book" for data CD) are not freely available. But you can find some details in available standards like Ecma-130.

The original audio CD (also called CD-DA) was modelled on the vinyl record, which means it also uses is a spiral track of continous audio data (the DVD later used circular tracks). Interleaved within this audio data in a very complex way are 8 subchannels (P to W), of which the Q subchannel contains timing information (literally in minutes/seconds/fractions of seconds) and the current track number. For the original purpose this was enough: For continous play, the lens was just adjusted slightly to follow the track. To seek, the lens would move while decoding the Q subchannel until the right track was found. This positioning is a bit coarse, but completely adequate to listen to music.

Still today, many computer CD drives cannot completely accurately position the lens and synchronize the decoding circuitry so that reading of audio samples starts at an exact position. This is why many CD ripping programs have a "paranoia" mode, where they do overlapping reads and compare the results to adjust for this "jitter". As part of the audio stream, the subchannel is also subject to jitter, and that is why you get different subchannel files when you rip on a CD drive that cannot position accurately.

When the data CD (CD-ROM) specification was developed to extend the CD-DA specification, the importantance to accurately address and read data was recognized, so the audio frame of 2352 byte was subdivided into 12 sync bytes and 4 header bytes (for the sector address), leaving the remaining 2336 bytes for data and an additional level of error correction. Using this scheme, sectors can be addressed exactly without having to rely on the Q channel information only. Therefore the jitter effect doesn't apply, you get always the same data when you dump a CD-ROM, and no additional cleverness in dumping is needed.

Edit with more details:

According to Ecma-130, the data is scrambled in stages: 24 bytes make up an F1-Frame, the bytes of 106 of these frames are distributed into 106 F2-Frames, which get 8 extra bytes of error correction. Those frames in turn each get an extra byte ("control byte") to make them into F3-Frames. The extra byte contains the subchannel information (one subchannel for each bit position). A group of 98 F3-Frames is called a section, and the 98 associated control bytes contain two sync bytes and 96 bytes of real subchannel data. The Q subchannel in addition has 16 bits of CRC error correction in those 96 bits.

The idea behind this is to distribute data on the surface of the disk in such a manner that scratches, dirt etc. don't affect a lot of continous bits, so the error correction can recover the lost data as long is the scratches are not too big.

As a consequence, the CD drive hardware needs to read a complete section after repositioning the lens to find out where it is in the data stream. The descrambling of the various stages is done by hardware, which needs to sync itself to the 2 sync bytes in control-byte stream. All CD drive models need a different amount of time to sync compared to other models (you can test that by reading from two different drives, if you have them), depending how the hardware is implemented. Also, many models don't always take the exact same time to sync, so they can start a little early or late, and output the descrambled data not always at the same byte.

So when the ripping program issues a READ CD (0xBE) command, it supplies a transfer length and a start address (or rather, Q-channel time). The drive positions the lens, descrambles the frames, extracts the Q-channel, compares the time, and when it finds the correct time, it starts to transfer. This transfer doesn't always begin at the same byte as explained above, so the result of multiple READ CD commands may be shifted against each other. That's why you see different subchannel files from your ripper.

Depending on the hardware and the circumstances when the lens is adjusted, it's more or less random if the transfer starts a few samples early or a few samples late. So the only pattern you'll see in the results is that the shifts are a multiple of the transfer length.

Some drive models actually have accurate hardware which will always start transfer at the same time. The standard defines a bit in mode page 0x2a ("CD/DVD Capabilities and Mechanical Status Page") which indicates if that is the case, but real-world experience shows that some drives claiming to be exact are in fact not. (Under Linux, you can use sg_modes from the sg3-utiles package to read the mode pages, I don't know what tool to use under Windows).

  • Thanks for your answer it gives me some intersting context. I understand that I don't need the subchannel to have proper data from the disc, I'm just wondering why the subchannel itself is not consistent across dumps. – Christophe Mar 18 '17 at 23:00
  • 1
    Yes, I tried to explain why the subchannel is not consistent: You send commands to the disk to read "raw" data including subchannels, and the positioning is not precise, so it can happen that reading starts at different points. If you'd compare the data you read, you'd see that parts are just shifted. OTOH, the CD-ROM data itself doesn't have this problem. And you need the context to understand why the positioning is not exact (though you'd need even more context for the exact reason, which I didn't go into). – dirkt Mar 19 '17 at 7:35
  • I'm interested in knowing the exact reason if it's possible. I added a download link to .sub files in my question. I compared it with a hex editor and you're right the data is shifted, I can't find any obvious pattern though. – Christophe Mar 19 '17 at 10:37
  • Very interesting, thanks. I installed cygwin, sg3-utils and ran sg_modes. I have 0x2a in the "MM capabilities and mechanical status (obsolete)" section. I will receive a new Liteon drive tomorrow and test again to see if I get consisten subchannel across dumps. – Christophe Mar 19 '17 at 18:37
  • 1
    The presence of the codepage doesn't mean anything, you have to look at the right bit (bit 1 of the 6th byte, "CD-DA stream is accurate"). If you have two drives, grab an audio CD, rip it on both drives and compare the data. You should see different offsets where the actual non-zero data starts. You'll probably also see different offsets for the subchannel files between the two drives. – dirkt Mar 19 '17 at 19:00

According to this Wikipedia article

A frame comprises 33 bytes, of which 24 bytes are audio or user data, eight bytes are error correction (CIRC-generated), and one byte is for subcode.

This suggests there is no error correction for subchannel.

I have also found another question elsewhere. It's about audio CDs but I think it addresses the right issue:

All I can say is that I've never managed to obtain two identical subchannel readings (*.SUB file) when reading from the same CD-DA/CD-TEXT. Is that normal when reading in RAW mode because data isn't corrected because CD-DA/CD-TEXT format doesn't carry EDC/ECC in all subchannels?

The answer there:

Only audio data is subjected to Reed-Solomon coding (C1 & C2). Subcode channel data (channels P...W) are not subjected to interleaving or error protection.

While dirkt may be right in another answer to your question that you may not need .sub files, the answer doesn't explicitly address your question:

What is the explanation of this behavior?

My answer: you get different .sub files because subchannels don't have error correction. Read errors are corrected (or at least detected) while reading audio or user data, but a read error can pass as-is when it occurs at subchannel bit. Particular errors due to scratches or dust may appear during one reading session, not appear during another etc. – hence .sub files that differ.


Answer expanded to address the comment:

I have two copies of this disk one being in excellent condition (no visible scratch) and the behavior is still the same. I also have other older game CD-ROMs in worst condition that have consistent .sub file across multiple dumps.

I suspect (unfortunately without hard evidence though) different CDs may have been manufactured with different quality. In a case when subchannels don't matter, the lower quality disk may still pass quality tests designed to detect data inconsistency only. Or it may be simply probabilistic matter: one disk has its weak spot(s) (a bit that gives inconsistent readings) where error correction can fix it; another happens to have it in subchannel area.

One such subchannel bit is enough to give you different checksums, while even thousands "undecided" bits in user data area may be silently corrected when it is needed, if only they are distributed enough, so the error correction algorithm deals with not-too-much-of-them at a time.


Answer expanded in reaction to KProbe 2 results.

As far as I know C1 errors are allowed (to some quantity) because they are silently corrected (more here). This correction works because of error correction bits. As I said before, subchannels don't have such a redundancy in general (dirkt mentions Q-subchannel CRC error correction but that doesn't change much in my conclusion). Moreover if the error occurs there, there is no way to know it, unless you know beforehand what the correct subchannel data is.

So you had a total of 1855 errors you know about. Repeat the test (seriously, do it!) and you may have e.g. 1790 errors; or 1892. Yet the corrected output is the same every time you read.

If there is one subchannel bit for every 32 data bits then I say you probably have about 1855/32 subchannel bits that were read with undetected error. That's about 58 bits. Well, almost, because thanks to Q-subchannel CRC some of these errors may be detected at least. Since Q is one of eight subchannels I estimate you are left with about 50 erroneous bits in other subchannels. Next time you read you may get few of these bits without an error, and few new subchannel errors elsewhere. So you will get different .sub file. And still you won't know for sure which of those bits were read correctly the first time or the second.

  • First of all thanks for your answer, I understand that medium condition is to take under consideration but I have two copies of this disk one being in excellent condition (no visible scratch) and the behavior is still the same. I also have other older game CD-ROMs in worst condition that have consistent .sub file across multiple dumps. I'm aware that I don't need the subchannel given that the game is not protected, I'm asking this question out of technical curiosity :). – Christophe Mar 18 '17 at 22:49
  • 1
    @Christophe I have expanded my answer. – Kamil Maciorowski Mar 18 '17 at 23:52
  • I understand. I think it could be interesting to have error information for the medium, I ordered a Liteon iHAS124 drive and will use kprobe2 to check this. I should have update on this tomorrow. – Christophe Mar 19 '17 at 10:44
  • I added the C1 error scan result to my question, it seems to be good, max is 25. – Christophe Mar 20 '17 at 18:45
  • 1
    @Christophe I have expanded my answer again. – Kamil Maciorowski Mar 20 '17 at 19:59

Your Answer

 

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Not the answer you're looking for? Browse other questions tagged or ask your own question.