| View previous topic :: View next topic |
| Author |
Message |
mbh neophyte
Joined: 17 Dec 2004 Posts: 21 Location: USA
|
Posted: Mon Jan 31, 2005 10:37 am Post subject: Interview with bavard about swarming/29 bug |
|
|
I had several questions about the swarming system [and the 29 bug].
I have posted this q/a session here. If people have any more questions, perhaps they could be mentioned as replies, so other e-mails can be sent out.
From: Fabrice <e-mail withheld>
Here are some replies to what you asked. you can of course diffuse them as much as you need.
First, we are discussing about src/daemon/common/commonSwarming2.ml, the second version of the swarming system.
> 1) What is the difference between a block and a chunk?
In my terminology, which is not always consistent, a "chunk" is an segment of data for a particular network. Chunks for edonkey are 9.5 MB segments. They all have the same size. A "block" is a segment of data internally in the swarming system. Blocks don't have the same size in one file. Suppose you are downloading a file from Edonkey and Bittorrent, where the chunks in BT for that file are 1 MB-long. Then, the file will be cut internally, so that any block in the file is always completely included in a chunk. For that, you must cut the
file every 9.5 MB and every 1 MB, so that you will have:
[1MB] [1MB] [1MB] [1MB] [1MB] [1MB] [1MB] [1MB] [1MB] [0.5MB][0.5MB]
[1MB] [1MB] etc...
the 10th and 11th blocks have 0.5 MB, because the first 10 chunks are included in the 9.5 MB Edonkey chunk, and the 10th and 11th blocks must also be included in a 1 MB BT chunk.
> 2) What is the difference between the swarmer type and the type t of
> the module system? Analogous to this question, what is the difference
> between a network block and a swarmer block?
The "swarmer" type is the record corresponding to a particular file downloaded on disk. The "t" type is an interface to that record, for a particular network. So, for one file, you will have one "swarmer" record, and several "t" records, one for each network where it is downloaded. One of them is the "primary", i.e. the network whose checksums are used to know if a particular block has been correctly downloaded or not. When all the blocks have been verified with the primary checksums, the file download is finished.
"t" contains also conversion arrays between network chunk numbers and internal swarmer blocks: t.t_t2s_blocks.(i) contains the list of internal block numbers corresponding to the chunk "i" in the network, whereas t.t_s2t_blocks.(i) contains the network block number
containing the internal block "i".
> 3) What is the purpose of the uploader type?
The "uploader" is a source for a file, with which you are connected and from which you can start downloading chunks. It contains all the data attached with that source needed by the swarming system, such as "up_chunks", the chunks that are available from that source,
"up_block" which is the current chunk been downloaded, "up_complete_blocks" the list of complete chunks that can be downloaded, and "up_ranges", the ranges that have been already requested from that client and not received yet.
> Functional questions
> 1) The swarming system seems to only be truly compatible with
> protocols in which files are downloaded in discrete ranges. However,
> many protocols don't implement such ranged downloads: for instance, in
> HTTP, many files do not provide a Content-Length field in the header,
> and in FTP, you can only specify a starting point for a file, not an
> ending point.
> Can the swarming system be used to download such files efficiently?
Currently, such networks declare small chunks (I think it's about 1 MB), the problem
is indeed that they must request a range within one chunk, while they could potentially download several chunks in one request. But I think one solution would be to allow an uploader to be requested several blocks in one request (putting a list in the "up_block" field), and then add a function that would return the longest set of blocks that can be
downloaded after the block returned by "find_block".
> 2) Is it possible to write files to disk without making use of the
> swarming system?
Yes, but there is no function to extract the file from the "t" record. You could add a :
"let get_file t = t.t_file" in the module, and "val get_file : t ->
file" in the interface, so that you can use the returned file using the CommonFile functions.
> 3) There is a mention in the 2.5.29 Changelog about a serious bug.
> Several people (including myself) believe the bug is related to some
> chunks becoming corrupt while upgrading from .28 to .29. Do you have
> any more information about this serious bug, perhaps where in the code
> it occurs, or ways to reproduce it?
If I remember correctly, I think it creates a new file with a different name, and then writes to that file instead of the former one. The result is that you have two files, one with the data downloaded before amd the second with the data downloaded after. When the download is finished, it saves only one file, which contains only the data downloaded after, while the chunks which were in the former file are lost (they are not checked again, since they were supposed to be already downloaded correctly). When I noticed that behavior, I decided that the version had not been checked enough before the release, so I wanted to test it more before doing anothe release...
I hope this will help you. I can enter in more details if you need. In particular, I could
write "comments" for functions or fields on demand.
Best regards, - Fabrice |
|
| Back to top |
|
 |
Amorphous skilled
Joined: 04 Sep 2004 Posts: 316
|
Posted: Mon Jan 31, 2005 3:45 pm Post subject: |
|
|
| So if i understand that correctly: "the bug" is that the naming-sheme in temp got changed and that on upgrade a new file with the new sheme is created and used (instead of moving the old file) and the already verified chunks get left behind in the old file. So when the file is finished the left behind chunks are still in the old temp file and so the finished download is corrupt. Can anyone reproduce that? |
|
| Back to top |
|
 |
mbh neophyte
Joined: 17 Dec 2004 Posts: 21 Location: USA
|
Posted: Mon Jan 31, 2005 4:02 pm Post subject: how to test |
|
|
The best way to test this would be to add a very large file (~500 MB), and perform an upgrade when this file is almost done (say 90%). If, after the file is completed, it goes back down to approx 10%, then this is the cause of the bug. A fix should be trivial.
/me adds a file to the queue |
|
| Back to top |
|
 |
spiralvoice Sage
Joined: 06 Jan 2003 Posts: 3983 Location: Germany
|
Posted: Tue Feb 08, 2005 11:54 pm Post subject: |
|
|
Hi,
!!! THIS IS A PROPOSAL FOR UPDATING TO 2.5.29 CORES, USE IT AT YOUR OWN RISK !!!
!!! IF YOU LOOSE DATA YOU HAVE BEEN WARNED !!!
I finally made some tests. I took a file from my running 2.5.28r temp directory which was 88.7% completed.
This file went into the temp of vanilla CVS 2.5.29, I started the core and it created another file
with "urn:ed2k:" prepended to the filename (=MD4 hash value) which was empty(!).
So original file was named like this: ABABABABA1717171717171, new file urn:ed2k:ABABABABA1717171717171
Logfile showed this:
| Code: | VERIFICATION OF BLOC 0 OF file FAILED
Swarmer block was complete. Removing data... |
bloc number were
0
0
1
2
3
.
.
.
71
What about the two zeros? Maybe the bug?
So this should be the bug b8_bavard mentioned.
| Fabrice wrote: | | If I remember correctly, I think it creates a new file with a different name, and then writes to that file instead of the former one. The result is that you have two files, one with the data downloaded before amd the second with the data downloaded after. |
I tried something different, stopped the core, deleted temp, newly copied that temp file
and renamed it to urn:ed2k:<md4>. Started the core but it did not recognize the file,
even after recover_temp. So I dllink´ed the edk-link of that file into the core, it created
a download in vd and files.ini and used the already existing tempfile, puh...
It had 100% availability, I remembered that situation already from the upgrade from 2.5.16 to
2.5.28, so I issued verify_chunks. This is the logfile output:
| Code: | VERIFICATION OF BLOC 14 OF file FAILED
Swarmer block was complete. Removing data... |
other bloc numbers:
21
27
29
31
38
47
50
56
58
So it looks better, 86% of the file were confirmed already downloaded, down from 88.7%...
Chunks look like this:
in 2.5.29:
| Code: | | 333333333333330333333033333030303333330333333330330333330303333333333333 |
in 2.5.28
| Code: | | 333333333333311133313133333131313133331333333131331333331313313333333333 |
So, to keep it short:
unshare all your shared directories to keep links list clean of shared files
restart the not-2.5.29 core because of this bug
issue "links" command and copy the output somewhere safe.
Create a new dir with a 2.5.29 core, start it and stop it.
Copy (or move if you are brave) your old temp files into the new temp directory.
Rename all files so they start with "urn:ed2k:" + plus old name
Start the 2.5.29 core and try if the "dllinks" works for you to restart the downloads.
If not, you have to put all ed2k links manually into the core...
Remember to run verify_chunks on all files and you should be up and running...
You will lose all not completed chunks, as always when files.ini is not available anymore. _________________ Link overview and precompiled cores here: http://mldonkey.sourceforge.net/DownloadLinks
Last edited by spiralvoice on Wed Feb 09, 2005 12:48 pm; edited 1 time in total |
|
| Back to top |
|
 |
Surround user
Joined: 13 May 2004 Posts: 194 Location: Germany
|
Posted: Tue Feb 08, 2005 11:58 pm Post subject: |
|
|
@spiralvoice:
Hehe, are you looking for someone to try that, to be sure you won't loose your files?
EDIT.: oops, you're fast ... your second post wasn't there when I wrote this ...
Last edited by Surround on Wed Feb 09, 2005 12:05 am; edited 1 time in total |
|
| Back to top |
|
 |
spiralvoice Sage
Joined: 06 Jan 2003 Posts: 3983 Location: Germany
|
Posted: Wed Feb 09, 2005 12:00 am Post subject: |
|
|
| Surround wrote: | Hehe, are you looking for someone to try that, to be sure you won't loose your files?  |
Yes, of course. I don´t want to buy a new, bigger HDD to be able to *copy* my temp files  _________________ Link overview and precompiled cores here: http://mldonkey.sourceforge.net/DownloadLinks |
|
| Back to top |
|
 |
Surround user
Joined: 13 May 2004 Posts: 194 Location: Germany
|
Posted: Wed Feb 09, 2005 12:13 am Post subject: |
|
|
| spiralvoice wrote: | | Surround wrote: | Hehe, are you looking for someone to try that, to be sure you won't loose your files?  |
Yes, of course. I don´t want to buy a new, bigger HDD to be able to *copy* my temp files  |
I heard rumour about some new technology out there ... similar to EPROM ... but not just with 32KB, but with 1 million times more memory !!! ... they call it DVD-RW!
Now seriously about your suggested method ... I rather would configure my existing mldonkey, to use entirely different ports ... and then fire up a 2.5.29 on standard-ports and download everything from 2.5.28, before I would mess around with everything as you suggested. |
|
| Back to top |
|
 |
spiralvoice Sage
Joined: 06 Jan 2003 Posts: 3983 Location: Germany
|
Posted: Wed Feb 09, 2005 12:16 am Post subject: |
|
|
| Surround wrote: | | download everything from 2.5.28 |
ehm, wasn´t it you who wrote that this version does not report all available chunks?  _________________ Link overview and precompiled cores here: http://mldonkey.sourceforge.net/DownloadLinks |
|
| Back to top |
|
 |
Surround user
Joined: 13 May 2004 Posts: 194 Location: Germany
|
Posted: Wed Feb 09, 2005 12:45 am Post subject: |
|
|
| spiralvoice wrote: | | Surround wrote: | | download everything from 2.5.28 |
ehm, wasn´t it you who wrote that this version does not report all available chunks?  |
Oh, it reports some random chunks for incomplete files ... just takes a little longer ... but, that's hardly a problem, you know loopback is fast ... realy fast ... and don't forget, I already have the fix ready here!
Well, you remembered me ... wasn't it you, who wrote mldonkey reports *all* chunks, even if it has nothing? ... You're prooven wrong, for complete files mldonkey reports no chunks at all and for incomplete files you can read the answer above.
So much about your "releasing" troubles ... just one thing makes me wonder ... how mldonkey manages to upload something at all for complete files !? Buggy clients who misinterprete a "no-bitmap-at-all-query-chunks-reply"?
Or lucky clients, that receive some more bytes from our mldonkey in one or the other requests thereafter and treat these bytes as bitmap? |
|
| Back to top |
|
 |
conradj user

Joined: 12 Jan 2005 Posts: 57 Location: Swirling clouds of Jupiter
|
Posted: Wed Feb 09, 2005 2:44 am Post subject: |
|
|
My temp directory typically take 55% - 60% of my disk, and my machine is full up for disks so i'm unable to go from 2.5.28 to 2.5.29 by starting up a new core and having it download from the old core.
But hey, that's just me!
--Good catch btw Surround, with the problem of mldonkey mis-reporting available chunks. That's probably why no one's asked me for friend slots since i went to 2.5.28.
cj |
|
| Back to top |
|
 |
Amorphous skilled
Joined: 04 Sep 2004 Posts: 316
|
Posted: Thu Feb 10, 2005 11:14 pm Post subject: |
|
|
from the code in donkeyComplexOptions.ml (at let value_to_file ... = ... let file_diskname) i can't see why it fails. i have some lprintf's inserted in the code, but until i can test it, some time will go by.... (running downloads and exam...)
| Code: |
Index: src/networks/donkey/donkeyComplexOptions.ml
===================================================================
--- src/networks/donkey/donkeyComplexOptions.ml (Revision 168)
+++ src/networks/donkey/donkeyComplexOptions.ml (Arbeitskopie)
@@ -275,13 +275,25 @@
get_value "file_md4" value_to_string
with _ -> failwith "Bad file_md4"
in
- let file_diskname = try
- get_value "file_diskname" value_to_string
- with _ ->
+ let file_diskname =
+ let filename =
+ try
+ get_value "file_diskname" value_to_string
+ with _ ->
let filename = Filename.concat !!temp_directory file_md4 in
- if Sys.file_exists filename then filename else
- Filename.concat !!temp_directory
+ lprintf "testing ed2k-temp-file %s .\n" filename;
+ if Sys.file_exists filename then
+ filename
+ else
+ Filename.concat !!temp_directory
(Printf.sprintf "urn:ed2k:%s" file_md4)
+ in
+ if not (Sys.file_exists filename) then
+ (* I think we should die here, to prevent any corruption. *)
+ lprintf "ERROR ED2K-TEMP-FILE %s DOES NOT EXIST, THIS WILL PERHAPS LEAD TO CORRUPTION IN THAT DOWNLOAD!!!!!!!!!!!!!!!\n"
+ filename;
+ lprintf "ed2k-temp-file %s used.\n" filename;
+ filename
in
let filenames = List.map (fun name -> name, GuiTypes.noips())
|
|
|
| Back to top |
|
 |
White_FrosT skilled
Joined: 02 Sep 2003 Posts: 422
|
Posted: Fri Feb 11, 2005 10:11 am Post subject: |
|
|
| Quote: | Remember to run verify_chunks on all files and you should be up and running...
You will lose all not completed chunks, as always when files.ini is not available anymore. |
But then there is still this 'other' command that might result in some more parts of chunks to be recovered:
| Quote: | | recover_bytes<f1> < f2> ... : try to recover these files at byte level |
edit: just thought of something. If someone has done this and we have a files.ini from before and one of after, we might write a small tool to convert the files. Then there would be any problem anymore to update. |
|
| Back to top |
|
 |
Amorphous skilled
Joined: 04 Sep 2004 Posts: 316
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © phpBB Group
|
|
|
|