aboutsummaryrefslogtreecommitdiff
path: root/TODO
blob: baf34f7b63744dc9103216c13f753507eefbd277 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
KEEP IT SIMPLE, STUPID!
-----------------------

Ideas are all over the place for this project. When adding something, let's
make sure it's reasonnable to do so and doesn't burden the user with some
overcomplicated concepts. We want the minimal and most elegant solution that
achieves our goals.


TASK LIST
=========

- High priority: invite, dht, groups
- Medium priority: net


Invitation system (invite, MED)
--------------------------------

Current issue: if a user PM's another user, that user does not know it!
He has to pull the PM shard voluntarily to get the messages.

A user may generate invitation tokens: an invite token is a signature of the
invited user's pk by the inviter user's sk.  The user that redeems the token
writes some data that is saved temporarily in the user's identity shard, which
might be stored by a 3rd party node for instance if we are on mobile devices
that cannot connect directly.


Networking improvements (net, HARD)
-----------------------

Here are some things to keep in mind that we want at some point:

- RPS
- Congestion control, proper multiplexing of feeds
- Proper management of open connections to peers

RPS question: can we integrate a preference for connections to peers that share the same shards?
All while preventing the network from being disconnected.
Ex: keep 100 total open connections that are sampled by proximity on the set of requested shards (bloom filter)
plus 2 or 5 full random for all shards.


DHT to find peers for a given shard (dht, EASY)
-----------------------------------

First option: use a library for MLDHT, makes everything simple but
makes us use UDP which does not work with Tor (can fix this later).

Second option: custom DHT protocol (we probably won't be doing this
anytime soon, if ever at all)


Partial merges, background pulls, caching (cache, req: bg, HARD)
-----------------------------------------

Don't even bother to pull all pages in the background, don't require
to store all depended pages. Replace that with a cache of the pages
we recently/frequently used + a way of distributing the storing of
the pages over all nodes.

To distribute the pages over all peers, we can use a DHT for example
or some kind of rendez-vous hashing. Rendez-vous hashing is reliable
but requires full connectivity, to alleviate that we can have only
a subset of nodes participate in the distributed storage, then they
become the supernodes that everyone calls to get pages. Still pages
can be broadcast between secondary peers to alleviate the load of
the superpeers. Basically the superpeers are only called for
infrequently used pages, for examples those of old data that is only
kept for archival purpose.


User groups and access control (groups, req: sign, HARD)
------------------------------

Groups with member lists, roles, etc. Use these as access control
lists for some shards.

Enforce access control in two ways: only push information to peers
that have proven they are a certain identity, and usage of a
secret key that all group members share to encrypt this data.


Trust lists (trust, req: sign, MED)
-----------

In their profile (identity shard), people can rate their trust of
other people. This information can be combined transitively to
evaluate the trust of any individual.

Maybe we can make a distributed algorithm for a more efficient
calculation of these trust values, open research question.


Automated access control based on trust (auto, req: trust, groups, HARD)
---------------------------------------

Automated algorithms that take account the trust values in
access control decisions (obviously these can only run when an
identity with admin privilege is running).



COMPLETED TASKS
===============

Block store root & GC handling (gc, QUITE EASY)
------------------------------

We want the block store app to be aware of what blocks are needed
or not. The Page protocol already implements dependencies between
blocks. 

The block store keeps all pages that have been put for a given
delay. Once the delay is passed, the pages are purged if they are
not required by a root we want to keep.


Partial sync/background pull for big objects (bg, req: gc, QUITE EASY)
--------------------------------------------

Implement the copy protocol as a lazy call that launches the copy
in background.

Remove the callback possibility in MerkleSearchTree.merge so that
pulling all the data is not required for a merge. The callback can
be only called on the items that are new in the last n (ex. 100)
items of the resulting tree, this is not implemented in the MST but
in the app that uses it since it is application specific.


Fix page hash semantics (pagehash, EASY)
-----------------------

Store and transmit the term binary and not the term itself so that
we are sure serialization is unique and hash is the same everywhere.

Terminology: stop using the word "block", use "page" everywhere.


Signed stuff, identity management (sign, MED)
---------------------------------

We want all messages that are stored in our data structures to have
a correct signature from a certain identity.

We can have a special "identity" shard type that enables storing
profile information such as nickname or other information that we
might want to make public.


Architecture for communication primitives (comm, MED)
-----------------------------------------

- Encrypted point to point communication (to communicate private info after ACL check) -> SHS
- Flooding, gossip  -> netgroups


Private chat (privchat, MED)
------------

Proof-of-concept for private things: shard for private chat between two people.


Epidemic broadcast (ep, EASY)
------------------

When a shard recieves new information from a peer, transfer that
information to some other neigbors.

How to select such neighbors ?

a. All those that we know of
b. Those that we are currently connected to
c. A random number of known peers

Best option: those that we are connected to + some random to
reach a quota (for example 10 or so)

Implementation: netgroups (lib/net/groups.ex)


Shard lifetime and dependency management (dep, MED)
----------------------------------------

Some Shards may pull other shards in, under certain conditions. For example
a stored folder shard will just be a list of other shards that we all pull in.

We want a way to have toplevel shards, shards that are dependencies of
toplevel shards, and shards that we keep for a number of days as a cache but
will expire automatically.