MLDonkey Forum Index
Homepage •  Bugs •  Tasks •  Patches •  SF.net Project Page •  ChangeLog •  German forum •  Links •  Wiki •  Downloads
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Interview with bavard about swarming/29 bug

 
Post new topic   Reply to topic    MLDonkey Forum Index -> Development
View previous topic :: View next topic  
Author Message
mbh
neophyte


Joined: 17 Dec 2004
Posts: 21
Location: USA

PostPosted: Mon Jan 31, 2005 10:37 am    Post subject: Interview with bavard about swarming/29 bug Reply with quote

I had several questions about the swarming system [and the 29 bug].
I have posted this q/a session here. If people have any more questions, perhaps they could be mentioned as replies, so other e-mails can be sent out.

From: Fabrice <e-mail withheld>

Here are some replies to what you asked. you can of course diffuse them as much as you need.
First, we are discussing about src/daemon/common/commonSwarming2.ml, the second version of the swarming system.

> 1) What is the difference between a block and a chunk?

In my terminology, which is not always consistent, a "chunk" is an segment of data for a particular network. Chunks for edonkey are 9.5 MB segments. They all have the same size. A "block" is a segment of data internally in the swarming system. Blocks don't have the same size in one file. Suppose you are downloading a file from Edonkey and Bittorrent, where the chunks in BT for that file are 1 MB-long. Then, the file will be cut internally, so that any block in the file is always completely included in a chunk. For that, you must cut the
file every 9.5 MB and every 1 MB, so that you will have:

[1MB] [1MB] [1MB] [1MB] [1MB] [1MB] [1MB] [1MB] [1MB] [0.5MB][0.5MB]
[1MB] [1MB] etc...

the 10th and 11th blocks have 0.5 MB, because the first 10 chunks are included in the 9.5 MB Edonkey chunk, and the 10th and 11th blocks must also be included in a 1 MB BT chunk.

> 2) What is the difference between the swarmer type and the type t of
> the module system? Analogous to this question, what is the difference
> between a network block and a swarmer block?

The "swarmer" type is the record corresponding to a particular file downloaded on disk. The "t" type is an interface to that record, for a particular network. So, for one file, you will have one "swarmer" record, and several "t" records, one for each network where it is downloaded. One of them is the "primary", i.e. the network whose checksums are used to know if a particular block has been correctly downloaded or not. When all the blocks have been verified with the primary checksums, the file download is finished.

"t" contains also conversion arrays between network chunk numbers and internal swarmer blocks: t.t_t2s_blocks.(i) contains the list of internal block numbers corresponding to the chunk "i" in the network, whereas t.t_s2t_blocks.(i) contains the network block number
containing the internal block "i".

> 3) What is the purpose of the uploader type?

The "uploader" is a source for a file, with which you are connected and from which you can start downloading chunks. It contains all the data attached with that source needed by the swarming system, such as "up_chunks", the chunks that are available from that source,
"up_block" which is the current chunk been downloaded, "up_complete_blocks" the list of complete chunks that can be downloaded, and "up_ranges", the ranges that have been already requested from that client and not received yet.

> Functional questions
> 1) The swarming system seems to only be truly compatible with
> protocols in which files are downloaded in discrete ranges. However,
> many protocols don't implement such ranged downloads: for instance, in
> HTTP, many files do not provide a Content-Length field in the header,
> and in FTP, you can only specify a starting point for a file, not an
> ending point.
> Can the swarming system be used to download such files efficiently?

Currently, such networks declare small chunks (I think it's about 1 MB), the problem
is indeed that they must request a range within one chunk, while they could potentially download several chunks in one request. But I think one solution would be to allow an uploader to be requested several blocks in one request (putting a list in the "up_block" field), and then add a function that would return the longest set of blocks that can be
downloaded after the block returned by "find_block".

> 2) Is it possible to write files to disk without making use of the
> swarming system?

Yes, but there is no function to extract the file from the "t" record. You could add a :
"let get_file t = t.t_file" in the module, and "val get_file : t ->
file" in the interface, so that you can use the returned file using the CommonFile functions.

> 3) There is a mention in the 2.5.29 Changelog about a serious bug.
> Several people (including myself) believe the bug is related to some
> chunks becoming corrupt while upgrading from .28 to .29. Do you have
> any more information about this serious bug, perhaps where in the code
> it occurs, or ways to reproduce it?

If I remember correctly, I think it creates a new file with a different name, and then writes to that file instead of the former one. The result is that you have two files, one with the data downloaded before amd the second with the data downloaded after. When the download is finished, it saves only one file, which contains only the data downloaded after, while the chunks which were in the former file are lost (they are not checked again, since they were supposed to be already downloaded correctly). When I noticed that behavior, I decided that the version had not been checked enough before the release, so I wanted to test it more before doing anothe release...

I hope this will help you. I can enter in more details if you need. In particular, I could
write "comments" for functions or fields on demand.

Best regards, - Fabrice
Back to top
View user's profile Send private message
Amorphous
skilled


Joined: 04 Sep 2004
Posts: 316

PostPosted: Mon Jan 31, 2005 3:45 pm    Post subject: Reply with quote

So if i understand that correctly: "the bug" is that the naming-sheme in temp got changed and that on upgrade a new file with the new sheme is created and used (instead of moving the old file) and the already verified chunks get left behind in the old file. So when the file is finished the left behind chunks are still in the old temp file and so the finished download is corrupt. Can anyone reproduce that?
Back to top
View user's profile Send private message
mbh
neophyte


Joined: 17 Dec 2004
Posts: 21
Location: USA

PostPosted: Mon Jan 31, 2005 4:02 pm    Post subject: how to test Reply with quote

The best way to test this would be to add a very large file (~500 MB), and perform an upgrade when this file is almost done (say 90%). If, after the file is completed, it goes back down to approx 10%, then this is the cause of the bug. A fix should be trivial.

/me adds a file to the queue
Back to top
View user's profile Send private message
spiralvoice
Sage


Joined: 06 Jan 2003
Posts: 3983
Location: Germany

PostPosted: Tue Feb 08, 2005 11:54 pm    Post subject: Reply with quote

Hi,

!!! THIS IS A PROPOSAL FOR UPDATING TO 2.5.29 CORES, USE IT AT YOUR OWN RISK !!!
!!! IF YOU LOOSE DATA YOU HAVE BEEN WARNED !!!


I finally made some tests. I took a file from my running 2.5.28r temp directory which was 88.7% completed.

This file went into the temp of vanilla CVS 2.5.29, I started the core and it created another file
with "urn:ed2k:" prepended to the filename (=MD4 hash value) which was empty(!).
So original file was named like this: ABABABABA1717171717171, new file urn:ed2k:ABABABABA1717171717171
Logfile showed this:
Code:
VERIFICATION OF BLOC 0 OF file FAILED
  Swarmer block was complete. Removing data...

bloc number were
0
0
1
2
3
.
.
.
71

What about the two zeros? Maybe the bug?
So this should be the bug b8_bavard mentioned.
Fabrice wrote:
If I remember correctly, I think it creates a new file with a different name, and then writes to that file instead of the former one. The result is that you have two files, one with the data downloaded before amd the second with the data downloaded after.


I tried something different, stopped the core, deleted temp, newly copied that temp file
and renamed it to urn:ed2k:<md4>. Started the core but it did not recognize the file,
even after recover_temp. So I dllink´ed the edk-link of that file into the core, it created
a download in vd and files.ini and used the already existing tempfile, puh...
It had 100% availability, I remembered that situation already from the upgrade from 2.5.16 to
2.5.28, so I issued verify_chunks. This is the logfile output:
Code:
VERIFICATION OF BLOC 14 OF file FAILED
  Swarmer block was complete. Removing data...

other bloc numbers:
21
27
29
31
38
47
50
56
58

So it looks better, 86% of the file were confirmed already downloaded, down from 88.7%...

Chunks look like this:
in 2.5.29:
Code:
333333333333330333333033333030303333330333333330330333330303333333333333

in 2.5.28
Code:
333333333333311133313133333131313133331333333131331333331313313333333333


So, to keep it short:
unshare all your shared directories to keep links list clean of shared files
restart the not-2.5.29 core because of this bug
issue "links" command and copy the output somewhere safe.
Create a new dir with a 2.5.29 core, start it and stop it.
Copy (or move if you are brave) your old temp files into the new temp directory.
Rename all files so they start with "urn:ed2k:" + plus old name

Start the 2.5.29 core and try if the "dllinks" works for you to restart the downloads.
If not, you have to put all ed2k links manually into the core...

Remember to run verify_chunks on all files and you should be up and running...
You will lose all not completed chunks, as always when files.ini is not available anymore.
_________________
Link overview and precompiled cores here: http://mldonkey.sourceforge.net/DownloadLinks


Last edited by spiralvoice on Wed Feb 09, 2005 12:48 pm; edited 1 time in total
Back to top
View user's profile Send private message
Surround
user


Joined: 13 May 2004
Posts: 194
Location: Germany

PostPosted: Tue Feb 08, 2005 11:58 pm    Post subject: Reply with quote

@spiralvoice:

Hehe, are you looking for someone to try that, to be sure you won't loose your files? Laughing

EDIT.: oops, you're fast ... your second post wasn't there when I wrote this ...


Last edited by Surround on Wed Feb 09, 2005 12:05 am; edited 1 time in total
Back to top
View user's profile Send private message
spiralvoice
Sage


Joined: 06 Jan 2003
Posts: 3983
Location: Germany

PostPosted: Wed Feb 09, 2005 12:00 am    Post subject: Reply with quote

Surround wrote:
Hehe, are you looking for someone to try that, to be sure you won't loose your files? Laughing

Yes, of course. I don´t want to buy a new, bigger HDD to be able to *copy* my temp files Wink
_________________
Link overview and precompiled cores here: http://mldonkey.sourceforge.net/DownloadLinks
Back to top
View user's profile Send private message
Surround
user


Joined: 13 May 2004
Posts: 194
Location: Germany

PostPosted: Wed Feb 09, 2005 12:13 am    Post subject: Reply with quote

spiralvoice wrote:
Surround wrote:
Hehe, are you looking for someone to try that, to be sure you won't loose your files? Laughing

Yes, of course. I don´t want to buy a new, bigger HDD to be able to *copy* my temp files Wink


I heard rumour about some new technology out there ... similar to EPROM ... but not just with 32KB, but with 1 million times more memory !!! ... they call it DVD-RW! Cool


Now seriously about your suggested method ... I rather would configure my existing mldonkey, to use entirely different ports ... and then fire up a 2.5.29 on standard-ports and download everything from 2.5.28, before I would mess around with everything as you suggested.
Back to top
View user's profile Send private message
spiralvoice
Sage


Joined: 06 Jan 2003
Posts: 3983
Location: Germany

PostPosted: Wed Feb 09, 2005 12:16 am    Post subject: Reply with quote

Surround wrote:
download everything from 2.5.28

ehm, wasn´t it you who wrote that this version does not report all available chunks? Wink
_________________
Link overview and precompiled cores here: http://mldonkey.sourceforge.net/DownloadLinks
Back to top
View user's profile Send private message
Surround
user


Joined: 13 May 2004
Posts: 194
Location: Germany

PostPosted: Wed Feb 09, 2005 12:45 am    Post subject: Reply with quote

spiralvoice wrote:
Surround wrote:
download everything from 2.5.28

ehm, wasn´t it you who wrote that this version does not report all available chunks? Wink


Oh, it reports some random chunks for incomplete files ... just takes a little longer ... but, that's hardly a problem, you know loopback is fast ... realy fast Wink ... and don't forget, I already have the fix ready here! Rolling Eyes

Well, you remembered me ... wasn't it you, who wrote mldonkey reports *all* chunks, even if it has nothing? ... You're prooven wrong, for complete files mldonkey reports no chunks at all and for incomplete files you can read the answer above.
So much about your "releasing" troubles Laughing ... just one thing makes me wonder ... how mldonkey manages to upload something at all for complete files !? Buggy clients who misinterprete a "no-bitmap-at-all-query-chunks-reply"?
Or lucky clients, that receive some more bytes from our mldonkey in one or the other requests thereafter and treat these bytes as bitmap?
Back to top
View user's profile Send private message
conradj
user


Joined: 12 Jan 2005
Posts: 57
Location: Swirling clouds of Jupiter

PostPosted: Wed Feb 09, 2005 2:44 am    Post subject: Reply with quote

My temp directory typically take 55% - 60% of my disk, and my machine is full up for disks so i'm unable to go from 2.5.28 to 2.5.29 by starting up a new core and having it download from the old core.

But hey, that's just me!

--Good catch btw Surround, with the problem of mldonkey mis-reporting available chunks. That's probably why no one's asked me for friend slots since i went to 2.5.28.

cj
Back to top
View user's profile Send private message
Amorphous
skilled


Joined: 04 Sep 2004
Posts: 316

PostPosted: Thu Feb 10, 2005 11:14 pm    Post subject: Reply with quote

from the code in donkeyComplexOptions.ml (at let value_to_file ... = ... let file_diskname) i can't see why it fails. i have some lprintf's inserted in the code, but until i can test it, some time will go by.... (running downloads and exam...)

Code:

Index: src/networks/donkey/donkeyComplexOptions.ml
===================================================================
--- src/networks/donkey/donkeyComplexOptions.ml (Revision 168)
+++ src/networks/donkey/donkeyComplexOptions.ml (Arbeitskopie)
@@ -275,13 +275,25 @@
       get_value "file_md4" value_to_string
     with _ -> failwith "Bad file_md4"
   in
-  let file_diskname = try
-      get_value "file_diskname" value_to_string
-    with _ ->
+  let file_diskname =
+    let filename =
+      try
+        get_value "file_diskname" value_to_string
+      with _ ->
         let filename = Filename.concat !!temp_directory file_md4 in
-        if Sys.file_exists filename then filename else
-          Filename.concat  !!temp_directory
+        lprintf "testing ed2k-temp-file %s .\n" filename;
+        if Sys.file_exists filename then
+          filename
+        else
+          Filename.concat !!temp_directory
             (Printf.sprintf "urn:ed2k:%s" file_md4)
+    in
+    if not (Sys.file_exists filename) then
+      (* I think we should die here, to prevent any corruption. *)
+      lprintf "ERROR ED2K-TEMP-FILE %s DOES NOT EXIST, THIS WILL PERHAPS LEAD TO CORRUPTION IN THAT DOWNLOAD!!!!!!!!!!!!!!!\n"
+        filename;
+    lprintf "ed2k-temp-file %s used.\n" filename;
+    filename
   in
 
   let filenames = List.map (fun name -> name, GuiTypes.noips())
Back to top
View user's profile Send private message
White_FrosT
skilled


Joined: 02 Sep 2003
Posts: 422

PostPosted: Fri Feb 11, 2005 10:11 am    Post subject: Reply with quote

Quote:
Remember to run verify_chunks on all files and you should be up and running...
You will lose all not completed chunks, as always when files.ini is not available anymore.

But then there is still this 'other' command that might result in some more parts of chunks to be recovered:
Quote:
recover_bytes<f1> < f2> ... : try to recover these files at byte level

edit: just thought of something. If someone has done this and we have a files.ini from before and one of after, we might write a small tool to convert the files. Then there would be any problem anymore to update.
Back to top
View user's profile Send private message
Amorphous
skilled


Joined: 04 Sep 2004
Posts: 316

PostPosted: Sat Feb 26, 2005 4:31 pm    Post subject: Reply with quote

i think i fixed the bug in revision 204 of my branch in the svn.
see https://opensvn.csie.org/viewcvs.cgi?root=mlnet&rev=204&view=rev for more infos
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    MLDonkey Forum Index -> Development All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Sourceforge.net Logo