Feature description

AICH hashes are based on SHA1
Detailed feature description

Savannah tracker


Code snippets


#define OP_AICHREQUEST			0x9B	// <HASH 16><uint16><HASH aichhashlen>
#define OP_AICHANSWER			0x9C	// <HASH 16><uint16><HASH aichhashlen> <data>


// for this version the limits are set very high, they might be lowered later
// to make a hash trustworthy, at least 10 unique Ips ( must have send it
// and if we have received more than one hash  for the file, one hash has to be send by more than 95% of all unique IPs
#define MINUNIQUEIPS_TOTRUST		10	// how many unique IPs most have send us a hash to make it trustworthy
#define	MINPERCENTAGE_TOTRUST		92  // how many percentage of clients most have sent the same hash to make it trustworthy


 SHA haset basically exists of 1 Tree for all Parts (9.28MB) + n  Trees
 for all blocks (180KB) while n is the number of Parts.
 This means it is NOT a complete hashtree, since the 9.28MB is a given level, in order
 to be able to create a hashset format similar to the MD4 one.

 If the number of elements for the next level are odd (for example 21 blocks to spread into 2 hashs)
 the majority of elements will go into the left branch if the parent node was a left branch
 and into the right branch if the parent node was a right branch. The first node is always
 taken as a left branch.

Example tree:
	FileSize: 19506000 Bytes = 18,6 MB

								X (18,6)                                   MasterHash
							 /     \
						 X (18,55)   \
					/		\	       \
                   X(9,28)  x(9,28)   X (0,05MB)						   PartHashs
			   /      \    /       \        \
		X(4,75)   X(4,57) X(4,57)  X(4,75)   \

X(180KB)   X(180KB)  [...] X(140KB) | X(180KB) X(180KB [...]			   BlockHashs
						 Border between first and second Part (9.28MB)

When sending hashs, they are send with a 16bit identifier which specifies its postion in the
tree (so StartPosition + HashDataSize would lead to the same hash)
The identifier basically describes the way from the top of the tree to the hash. a set bit (1)
means follow the left branch, a 0 means follow the right. The highest bit which is set is seen as the start-
postion (since the first node is always seend as left).


								x                   0000000000000001
							 /     \		
						 x		    \				0000000000000011
					  /		\	       \
                    x       _X_          x 	        0000000000000110

Version 2 of AICH also supports 32bit identifiers to support large files, check CAICHHashSet::CreatePartRecoveryData


Some design thoughts

[16:40:40] <spiralvoice> pango: or create a new subdir in $MLDONKEY_DIR and keep ini files with aich hashes
there named after the md4 hash
[16:40:52] <spiralvoice> pango: one ini file per md4 hash
[16:40:59] <pango> yes, that's a possibility
[16:41:32] <pango> not too efficient, but should at least work
[16:48:29] <pango> parsing, the packet computation, then garbage collection of the hashes will take some time and cpu...
maybe we'll need to kind of throttling to avoid DoSes
