KEEP IT SIMPLE, STUPID! ----------------------- Ideas are all over the place for this project. When adding something, let's make sure it's reasonnable to do so and doesn't burden the user with some overcomplicated concepts. We want the minimal and most elegant solution that achieves our goals. TASK LIST ========= - Highest priority: pagehash, sign - Medium priority: dht, ep, dep Fix page hash semantics (pagehash, EASY) ----------------------- Store and transmit the term binary and not the term itself so that we are sure serialization is unique and hash is the same everywhere. Terminology: stop using the word "block", use "page" everywhere. DHT to find peers for a given shard (dht, EASY) ----------------------------------- First option: use a library for MLDHT, makes everything simple but makes us use UDP which does not work with Tor (can fix this later). Second option: custom DHT protocol (we probably won't be doing this anytime soon, if ever at all) Epidemic broadcast (ep, EASY) ------------------ When a shard recieves new information from a peer, transfer that information to some other neigbors. How to select such neighbors ? a. All those that we know of b. Those that we are currently connected to c. A random number of known peers Best option: those that we are connected to + some random to reach a quota (for example 10 or so) Partial merges, background pulls, caching (cache, req: bg, HARD) ----------------------------------------- Don't even bother to pull all pages in the background, don't require to store all depended pages. Replace that with a cache of the pages we recently/frequently used + a way of distributing the storing of the pages over all nodes. To distribute the pages over all peers, we can use a DHT for example or some kind of rendez-vous hashing. Rendez-vous hashing is reliable but requires full connectivity, to alleviate that we can have only a subset of nodes participate in the distributed storage, then they become the supernodes that everyone calls to get pages. Still pages can be broadcast between secondary peers to alleviate the load of the superpeers. Basically the superpeers are only called for infrequently used pages, for examples those of old data that is only kept for archival purpose. Signed stuff, identity management (sign, MED) --------------------------------- We want all messages that are stored in our data structures to have a correct signature from a certain identity. We can have a special "identity" shard type that enables storing profile information such as nickname or other information that we might want to make public. Proof-of-concept: shard for private chat between two people. User groups and access control (groups, req: sign, HARD) ------------------------------ Groups with member lists, roles, etc. Use these as access control lists for some shards. Enforce access control in two ways: only push information to peers that have proven they are a certain identity, and usage of a secret key that all group members share to encrypt this data. Trust lists (trust, req: sign, MED) ----------- In their profile (identity shard), people can rate their trust of other people. This information can be combined transitively to evaluate the trust of any individual. Maybe we can make a distributed algorithm for a more efficient calculation of these trust values, open research question. Automated access control based on trust (auto, req: trust, groups, HARD) --------------------------------------- Automated algorithms that take account the trust values in access control decisions (obviously these can only run when an identity with admin privilege is running). Shard dependency management (dep, MED) --------------------------- Some Shards may pull other shards in, under certain conditions. For example a stored folder shard will just be a list of other shards that we all pull in. We want a way to have toplevel shards, shards that are dependencies of toplevel shards, and shards that we keep for a number of days as a cache but will expire automatically. COMPLETED TASKS =============== Block store root & GC handling (gc, QUITE EASY) ------------------------------ We want the block store app to be aware of what blocks are needed or not. The Page protocol already implements dependencies between blocks. The block store keeps all pages that have been put for a given delay. Once the delay is passed, the pages are purged if they are not required by a root we want to keep. Partial sync/background pull for big objects (bg, req: gc, QUITE EASY) -------------------------------------------- Implement the copy protocol as a lazy call that launches the copy in background. Remove the callback possibility in MerkleSearchTree.merge so that pulling all the data is not required for a merge. The callback can be only called on the items that are new in the last n (ex. 100) items of the resulting tree, this is not implemented in the MST but in the app that uses it since it is application specific.