KEEP IT SIMPLE, STUPID!
-----------------------
Ideas are all over the place for this project. When adding something, let's
make sure it's reasonnable to do so and doesn't burden the user with some
overcomplicated concepts. We want the minimal and most elegant solution that
achieves our goals.
TASK LIST
=========
- Highest priority: comm, sign, ep
- Medium priority: dht, dep
Architecture for communication primitives (comm, MED)
-----------------------------------------
Find the right abstraction(s) for communiation channels.
Here are some things to keep in mind that we want at some point:
- Encrypted point to point communication (to communicate private info after ACL check)
- Flooding, gossip, RPS
- Congestion control, proper multiplexing of feeds
- Proper management of open connections to peers
RPS question: can we integrate a preference for connections to peers that share the same shards?
All while preventing the network from being disconnected.
Ex: keep 100 total open connections that are sampled by proximity on the set of requested shards (bloom filter)
plus 2 or 5 full random for all shards.
DHT to find peers for a given shard (dht, EASY)
-----------------------------------
First option: use a library for MLDHT, makes everything simple but
makes us use UDP which does not work with Tor (can fix this later).
Second option: custom DHT protocol (we probably won't be doing this
anytime soon, if ever at all)
Epidemic broadcast (ep, EASY)
------------------
When a shard recieves new information from a peer, transfer that
information to some other neigbors.
How to select such neighbors ?
a. All those that we know of
b. Those that we are currently connected to
c. A random number of known peers
Best option: those that we are connected to + some random to
reach a quota (for example 10 or so)
Partial merges, background pulls, caching (cache, req: bg, HARD)
-----------------------------------------
Don't even bother to pull all pages in the background, don't require
to store all depended pages. Replace that with a cache of the pages
we recently/frequently used + a way of distributing the storing of
the pages over all nodes.
To distribute the pages over all peers, we can use a DHT for example
or some kind of rendez-vous hashing. Rendez-vous hashing is reliable
but requires full connectivity, to alleviate that we can have only
a subset of nodes participate in the distributed storage, then they
become the supernodes that everyone calls to get pages. Still pages
can be broadcast between secondary peers to alleviate the load of
the superpeers. Basically the superpeers are only called for
infrequently used pages, for examples those of old data that is only
kept for archival purpose.
Signed stuff, identity management (sign, MED)
---------------------------------
We want all messages that are stored in our data structures to have
a correct signature from a certain identity.
We can have a special "identity" shard type that enables storing
profile information such as nickname or other information that we
might want to make public.
Proof-of-concept: shard for private chat between two people.
User groups and access control (groups, req: sign, HARD)
------------------------------
Groups with member lists, roles, etc. Use these as access control
lists for some shards.
Enforce access control in two ways: only push information to peers
that have proven they are a certain identity, and usage of a
secret key that all group members share to encrypt this data.
Trust lists (trust, req: sign, MED)
-----------
In their profile (identity shard), people can rate their trust of
other people. This information can be combined transitively to
evaluate the trust of any individual.
Maybe we can make a distributed algorithm for a more efficient
calculation of these trust values, open research question.
Automated access control based on trust (auto, req: trust, groups, HARD)
---------------------------------------
Automated algorithms that take account the trust values in
access control decisions (obviously these can only run when an
identity with admin privilege is running).
Shard dependency management (dep, MED)
---------------------------
Some Shards may pull other shards in, under certain conditions. For example
a stored folder shard will just be a list of other shards that we all pull in.
We want a way to have toplevel shards, shards that are dependencies of
toplevel shards, and shards that we keep for a number of days as a cache but
will expire automatically.
COMPLETED TASKS
===============
Block store root & GC handling (gc, QUITE EASY)
------------------------------
We want the block store app to be aware of what blocks are needed
or not. The Page protocol already implements dependencies between
blocks.
The block store keeps all pages that have been put for a given
delay. Once the delay is passed, the pages are purged if they are
not required by a root we want to keep.
Partial sync/background pull for big objects (bg, req: gc, QUITE EASY)
--------------------------------------------
Implement the copy protocol as a lazy call that launches the copy
in background.
Remove the callback possibility in MerkleSearchTree.merge so that
pulling all the data is not required for a merge. The callback can
be only called on the items that are new in the last n (ex. 100)
items of the resulting tree, this is not implemented in the MST but
in the app that uses it since it is application specific.
Fix page hash semantics (pagehash, EASY)
-----------------------
Store and transmit the term binary and not the term itself so that
we are sure serialization is unique and hash is the same everywhere.
Terminology: stop using the word "block", use "page" everywhere.