Status MVP: Status Core Contributors use Status #7

fryorcraken · 2022-11-22T12:51:12Z

Deadline: Beginning of December

High level requirement: 120-130 people uses Status Communities over Waku v2.

Details

Client Diversity

All users uses Status Desktop

Network Connectivity

Clients are mostly online
- Laptops on during working hours
- Offline during night
- Some app reboot
Assumed mostly stable internet connection (WiFi + DSL/Filber?)

Network Topology

Only fleet nodes provide store, light push and filter services
Only TCP transport is used.
Users connect to fleet and each other, Waku Relay is used.
- Users are behind nat devices.
- Connections drops semi-regularly (see Network Connectivity)
- Low upload bandwidth > low download bandwith for another user
- Discovery needed to find users.

Details

Confirm that 1/2/3 below are not needed and that we currently have enough inbound connectivity thanks to discv5 + uPnP to support a healthy network.

AutoNat when clients come online:

go-waku: Use AutoNat when client comes online. Use public IP address for discovery if succeeds.
go-waku (conditional to outcome): if most clients in the network are reachable and can therefore successfully be discovered, the few clients that can't, could discover random peers and make outgoing connections only using Waku Peer Exchange (at least in the meantime).

If AutoNat fails, use AutoRelay:

go-waku: discover and initiate circuit-relay connection to random peers if (1) failed
nwaku: enable libp2p circuit relay (already supported)
NATless discovery mechanism required here to discover relay addresses:
- Option 1: libp2p rendezvous
  - nwaku: enable/integrate libp2p rendezvous (already supported in nim-libp2p)
  - go-waku: implement libp2p rendezvous client (should not be too complicated)
- Option2: libp2p kad-dht
  - nim-libp2p: implement kad-dht (significant effort)
  - nwaku: integrate and enable libp2p kad-dht
  - go-waku: integrate and enable libp2p kad-dht

DCuTR: hole-punching to create direct connection

go-waku: enable/integrate

No one-size-fits-all solution for NAT. It will be an iterative process based on dogfooding feedback.
Possible other workaround.

Help Status CCs enable uPnP on their routers if AutoNat fails.

Roadmap

AutoNat in go-waku.
Waku Peer exchange go-waku <> go-waku/nwaku
AutoRelay (may not be needed)
- libp2p rendezvous nwaku + go-waku (as client)
- libp2p kad-dht : nim-libp2p + nwaku + go-waku (may not be needed)
DCuTR: go-waku (may not be needed)
Status CC enable uPNP (may not be needed)

Connection Numbers

Target 150 nodes.
Each node has at least one connection with a bootstrap node. Should assume two?
Status Client to confirm expected usage of Status Web.

Roadmap

Fleet can handle the expected number of connections

Availability

Confirm current nwaku uptime on Status prod thanks to Canary
Get sign off from Status client.

Waku Store

Store Data Volume

Extract from Status Discord to know expect volume of messages
- # of messages in 30 days
- Total size of 30 days of messages

Store Query Frequency

Assuming pattern defined in Network connectivity
- Mostly 72 hours queries (laptop off in weekend)
- 30 days queries on occasional app reset
peak of queries when app start at begin of work day. Monday highest due to weekend overlap. ~90 CCs in Europe.
Need to understand total volume of queries based on # of communities, channels, messages and contacts
Status Client to confirm expected usage of Status Web.

Store Query Format

Status Client to provide list of exact store query formats to ensure that nwaku unit tests cover all scenarios (# of content topics, cursor +- timefilter, etc)

Roadmap

Confirm expected data volume/frequency/format
Review SQLite upper bound performance (from published benchmarks)

Issues:

bug: inconsistent / missing messages in status.prod fleet nwaku#1400

Peer Behaviour

Peers mostly behave correctly
Tracking of peers with poor bandwidth/connectivity may be needed, or peer that cannot accept inbound connections.

Bridging

Status Client to confirm suggestion from offsite to disable v1<> v2 bridging

Peer Persistence

Status Client to specify whether discovered peers be persisted across restarts (NB to remember gossipsub mandatory backoff period here)?

fryorcraken · 2022-11-22T12:54:46Z

(a) @jm-clius please review issue above and edit to add any other assumption/questions. I tried to extract most relevant info from https://forum.vac.dev/t/towards-a-working-waku-v2-infrastructure/154

Also, the Network Topology section deems clarification.

(b) @jm-clius Can you please clarify whether my interpretation is correct and confirm/deny whether (2) is feasible and what steps/dogfooding would be needed to make it happen? (e.g. dogfood upper limit of nwaku number of connections).

Once this is done, we can ask Status Client team to:

(c) Check that assumptions above for the December deadline and provide clarifications/corrections if necessary.
(d) Acknowledged the Network Topology proposal

Finally, edit the description to:

(e) track any blocking issue relevant to each
(f) track the dogfooding/sign-off needed by Status Client team for each topic (@fryorcraken can help handle that)

jm-clius · 2022-11-22T17:23:16Z

Network Topology

Work effort for Topology 1: Waku Relay

For (1) I think the scope and order of work is (roughly):

AutoNat when clients come online:

go-waku: Use AutoNat when client comes online. Use public IP address for discovery if succeeds.
Note: I think go-waku already supports the same NAT traversal techniques as nwaku (UPnP, NATPMP @richard-ramos can confirm?). This step gives us an idea of whether the next steps are necessary/urgent (perhaps most clients are successfully reachable with existing techniques).
go-waku (conditional to outcome): if most clients in the network are reachable and can therefore successfully be discovered, the few clients that can't, could discover random peers and make outgoing connections only using Waku Peer Exchange (at least in the meantime).

If AutoNat fails, use AutoRelay (conditional to outcome of 1):

go-waku: discover and initiate circuit-relay connection to random peers if (1) failed
nwaku: enable libp2p circuit relay (already supported)
NATless discovery mechanism required here to discover relay addresses:
- Option 1: libp2p rendezvous
  - nwaku: enable/integrate libp2p rendezvous (already supported in nim-libp2p)
  - go-waku: implement libp2p rendezvous client (should not be too complicated)
- Option2: libp2p kad-dht
  - nim-libp2p: implement kad-dht (significant effort)
  - nwaku: integrate and enable libp2p kad-dht
  - go-waku: integrate and enable libp2p kad-dht

DCuTR: hole-punching to create direct connection

go-waku: enable/integrate

This solution would then need to be targeted for dogfooding under various scenarios.
Note that there's no one-size-fits-all solution for NAT and restrictive networking conditions, so data-gathering (e.g. running AutoNat) to see which clients are publicly reachable and which traversal techniques work, will be part of the effort. Perhaps helping contributors enable UPnP on their routers if AutoNat fails could be an intermediate, Status-internal step?

Work effort for Topology 2: Waku Filter, Waku Lightpush

This topology is much more risky and has many more unknowns. Neither of these protocols have been target tested for dogfooding, scalability is unknown, some redundancy is required client-side, etc. For this to get to production, I can think of at least the following outstanding items likely to be important in the protocol itself:

method for filter client to check state, remove, refresh or update an existing subscription
ACK mechanism for filter subscription requests
connectivity investigation (e.g. clients only able to make outbound connections), which may either require keeping the outbound connection open or NAT techniques

This implies updates to the protocols, implementation changes for go-waku and nwaku and targeted dogfooding.

I don't think it's feasible to support this topology within a short time frame. More on this ongoing effort here and here.

richard-ramos · 2022-11-22T17:43:13Z

AutoNat when clients come online

I confirm that go-waku supports upnp / pmp. (tested and also confirmed by @cammellos as he was able to reach my machine).

Regarding the few clients that can't, could discover random peers and make outgoing connections only using Waku Peer Exchange. If we were to expose Peer Exchange to status-go, what would be the criteria to chose which peer-exchange-node should be used to request nodes from? should it be chosen randomly from the fleet nodes?

If AutoNat fails, use AutoRelay (conditional to outcome of 1) - Option 1: libp2p rendezvous

I had an implementation of libp2p rendezvous client and server that I removed recently in waku-org/go-waku#351 which had a change to use ENRs instead of signed peer records. If necessary it can be added back and remove the change that I did so it uses libp2p signed records.

If AutoNat fails, use AutoRelay (conditional to outcome of 1) - Option2: libp2p kad-dht

While go-libp2p has an implementation of kad-dht available (https://pkg.go.dev/github.com/libp2p/go-libp2p-kad-dht), it needs to be integrated in go-waku.

jm-clius · 2022-11-22T17:47:09Z

If we were to expose Peer Exchange to status-go, what would be the criteria to chose which peer-exchange-node should be used to request nodes from? should it be chosen randomly from the fleet nodes?

@richard-ramos, indeed. Risk is of course that this has not been dogfooded, but it's not a resource-intensive protocol (cached set of random peers should be available immediately upon request)

jm-clius · 2022-11-22T17:50:28Z

@richard-ramos the idea under Topology (1) is based roughly off what was discussed before, with some additions (e.g. the proposed discovery methods, using waku peer exchange, etc.) Does the work items/order make roughly sense to you? Do we already know, maybe via AutoNat, roughyl what proportion of clients are affected by unsupported NAT traversal?

richard-ramos · 2022-11-22T18:52:03Z

The work items do make sense. Currently I'm doing a poll in #waku-e2e to get an idea of the status of NAT across clients.

fryorcraken · 2022-11-23T03:14:14Z

If AutoNat fails, use AutoRelay (conditional to outcome of 1) - Option 1: libp2p rendezvous

Are you saying if AutoNat fails for an individual node then they use AutoRelay (or you mentioned peer exchange too). Or are you saying that if the technology overall fails, then the backup plan would be to use AutoRelay?

fryorcraken · 2022-11-23T03:21:10Z

@jm-clius considering that we aim for p2p connections between Status Desktop items. Then we can imagine that some peers will provide poor connection quality (high latency, low bandwidth). How can we scope this as part of this milestone?

jm-clius · 2022-11-23T09:16:00Z

Are you saying if AutoNat fails for an individual node then they use AutoRelay (or you mentioned peer exchange too). Or are you saying that if the technology overall fails, then the backup plan would be to use AutoRelay?

The former. In other words, the circuit-relay-to-hole-punching procedure can be triggered if the client determines its not publicly reachable (via AutoNat). I'm also saying that if AutoNat shows that existing NAT traversal techniques generally work for most clients, this AutoRelay/hole-punching procedure may not be critical for this milestone (those clients can connect to others after discovering using Waku Peer Exchange, for example).

Then we can imagine that some peers will provide poor connection quality (high latency, low bandwidth). How can we scope this as part of this milestone?

It depends: our ultimate solution for such peers is filter and lightpush, which is out of scope. This can be enabled as experimental feature and must be dogfooded already, but with the understanding that these are beta features. An intermediate step would be to e.g. use Waku Peer Exchange as light discovery mechanism and attempt to replenish connectivity in this way. Relay can be surprisingly resilient as it includes "error mechanisms" such as the IWANT/IHAVE control checks. This would imply some extra latency for such clients.

fryorcraken · 2022-11-28T23:25:02Z

Looks like AutoNat is not needed: https://docs.google.com/spreadsheets/d/1xgtSQpIUB1k1aIenSF_wpc4zfXuuykKbX5Ne4SNaOyw/edit?usp=sharing

@Menduist could you help me here summarize the requirements for a healthy p2p network? @jm-clius mentioned that 25% of nodes need to accept incoming connections, is that for a healthy gossipsub with D=6?

fryorcraken · 2022-11-28T23:49:31Z

Another topic not discussed is the expectation of node availability.
@alrevuelta I believe you are able to pull some stats from the canary node? What do we ahve at the moment for status.prod fleet please? let's say past 7 days.

fryorcraken · 2022-11-29T00:00:04Z

@richard-ramos: Yeah, yesterday i created a fix for an issue related to Discovery V5. After gowaku acquired the external address, it was not updating the ENR for Discovery v5. Hence, the nodes were not being discovered, I opened a PR fixing that and now the number of peers that you can connect to assuming you that have open UPNP or PNP enabled has increased by a lot. (23 Nov)

It looks like NAT traversal strategy are not needed for this milestones as the connectivity issue was related to discv5. @richard-ramos can you please confirm and provide reference to the PR?

fryorcraken · 2022-11-29T00:10:53Z

@LNSD What kind of information are we able to extract from Status Prod fleet sqlite? e..g current DB size?
@alrevuelta is it possible using the metric node to get the number and size of messages (traffic volume) sent on Status' content topics?

alrevuelta · 2022-11-29T10:27:51Z

Another topic not discussed is the expectation of node availability.
@alrevuelta I believe you are able to pull some stats from the canary node? What do we ahve at the moment for status.prod fleet please? let's say past 7 days.

Don't have the metric right now, but I'm planning to report on it (having problems discovering peers with the network monito tool). Is this what you are referring to?

amount of nodes we can connect
amount of total discovered nodes.

@alrevuelta is it possible using the metric node to get the number and size of messages (traffic volume) sent on Status' content topics?

Not exactly the same but this is what I have right now:

amount of messages per minute
traffic bitrate downstream and upstream (just added it in chore(metrics): add plot with traffic + update to latest grafana nwaku#1433)

One interesting finding regarding the traffic:

Traffic really decreases 12:00 to 7am CET time.
We can see some pattern of usage, having high usage during 8am 3am CET time (EU+USA working hours?)
And very low usage during weekends.

Will continue with this, hope this helps by now.

Menduist · 2022-11-29T10:41:20Z

Looks like AutoNat is not needed

AutoNat is part of the Hole-Punching stack, so it is required

could you help me here summarize the requirements for a healthy p2p network?

This spreadsheet is based around the percent of the network that a type of node can reach.
If that is 1%, it means that 1% of the network becomes a hotspot that will bottleneck out (assuming that type of node is frequent enough)
It's a simplification of reality, but should at least give some good intuitions

25% comes from = D_out / D_high = 3 / 12. That's the absolute minimum since peers will always keep at least D_out connections in their mesh (for sybil protection reasons), so they need at least 25% of outgoing connections. (if you reverse this, they will become a bottleneck if we expect them to take >75% in incoming connections)
I don't have a good figure on the required percentage to be "healthy" (that probably requires simulations), 25% is the lowest-possible-lower-bound.

75% is apparently the best we can hope for given network shares, UPnP popularity & Hole-Punching success rates.

LNSD · 2022-11-29T11:32:47Z

@LNSD What kind of information are we able to extract from Status Prod fleet sqlite? e..g current DB size?

Let's talk to Waku archive (the Waku store message persistence backend). SQLite is just one of the possible persistence backend drivers.

The current exposed metrics are:

Number of stored messages
Message insertion duration
Persistent storage query time
Message validation errors: At this moment, message timestamps are checked by the Waku archive implementation. Messages outside the [now-20s, now+20s] range are discarded and reported invalid.

jm-clius · 2022-11-29T12:21:58Z

Another topic not discussed is the expectation of node availability. @alrevuelta I believe you are able to pull some stats from the canary node? What do we ahve at the moment for status.prod fleet please? let's say past 7 days.

Availability over the last 7 days have been 92.23%. See this report.

And note that for the last 5 days it's higher (~95%), likely due to more config and other improvements. Report here

richard-ramos · 2022-11-29T21:25:45Z

@fryorcraken
Reference PR for DiscV5 fix in go-waku: waku-org/go-waku#368
in status-go: status-im/status-go#2972

oskarth · 2023-01-13T05:01:21Z

How is this issue different from #8? Can one be closed?

jm-clius · 2023-01-13T14:54:51Z

This issue was for the Desktop users. #8 is for launching on Mobile too. Closing this.

oskarth · 2023-01-16T04:27:18Z

Oh gotcha

fryorcraken added the RAID label Nov 22, 2022

fryorcraken added this to Waku Nov 22, 2022

fryorcraken assigned jm-clius Nov 22, 2022

This comment was marked as resolved.

Sign in to view

fryorcraken changed the title ~~Status MVP: Status CC use Status~~ Status MVP: Status Core Contributors use Status Nov 23, 2022

richard-ramos mentioned this issue Nov 23, 2022

wakuv2: expose peer exchange status-im/status-go#2974

Closed

jm-clius mentioned this issue Nov 25, 2022

feat: enable AutoNAT and libp2p circuit relay waku-org/nwaku#1425

Merged

jm-clius mentioned this issue Dec 6, 2022

Status MVP: Status Core Contributors use Status Mobile #8

Closed

20 tasks

jm-clius removed this from Waku Dec 12, 2022

jm-clius closed this as completed Jan 13, 2023

jm-clius mentioned this issue Jan 20, 2023

[Milestone] Waku Network Can Support 10K Users #12

Closed

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Status MVP: Status Core Contributors use Status #7

Status MVP: Status Core Contributors use Status #7

fryorcraken commented Nov 22, 2022 •

edited

Loading

fryorcraken commented Nov 22, 2022 •

edited by jm-clius

Loading

jm-clius commented Nov 22, 2022

This comment was marked as resolved.

This comment was marked as resolved.

richard-ramos commented Nov 22, 2022

jm-clius commented Nov 22, 2022

jm-clius commented Nov 22, 2022

richard-ramos commented Nov 22, 2022

fryorcraken commented Nov 23, 2022

fryorcraken commented Nov 23, 2022

jm-clius commented Nov 23, 2022

fryorcraken commented Nov 28, 2022

fryorcraken commented Nov 28, 2022

fryorcraken commented Nov 29, 2022

fryorcraken commented Nov 29, 2022

alrevuelta commented Nov 29, 2022 •

edited

Loading

Menduist commented Nov 29, 2022 •

edited

Loading

LNSD commented Nov 29, 2022

jm-clius commented Nov 29, 2022 •

edited

Loading

richard-ramos commented Nov 29, 2022 •

edited

Loading

oskarth commented Jan 13, 2023

jm-clius commented Jan 13, 2023

oskarth commented Jan 16, 2023

Status MVP: Status Core Contributors use Status #7

Status MVP: Status Core Contributors use Status #7

Comments

fryorcraken commented Nov 22, 2022 • edited Loading

Details

Client Diversity

Network Connectivity

Network Topology

Details

Roadmap

Connection Numbers

Roadmap

Availability

Waku Store

Store Data Volume

Store Query Frequency

Store Query Format

Roadmap

Peer Behaviour

Bridging

Peer Persistence

fryorcraken commented Nov 22, 2022 • edited by jm-clius Loading

jm-clius commented Nov 22, 2022

Network Topology

Work effort for Topology 1: Waku Relay

Work effort for Topology 2: Waku Filter, Waku Lightpush

This comment was marked as resolved.

This comment was marked as resolved.

richard-ramos commented Nov 22, 2022

jm-clius commented Nov 22, 2022

jm-clius commented Nov 22, 2022

richard-ramos commented Nov 22, 2022

fryorcraken commented Nov 23, 2022

fryorcraken commented Nov 23, 2022

jm-clius commented Nov 23, 2022

fryorcraken commented Nov 28, 2022

fryorcraken commented Nov 28, 2022

fryorcraken commented Nov 29, 2022

fryorcraken commented Nov 29, 2022

alrevuelta commented Nov 29, 2022 • edited Loading

Menduist commented Nov 29, 2022 • edited Loading

LNSD commented Nov 29, 2022

jm-clius commented Nov 29, 2022 • edited Loading

richard-ramos commented Nov 29, 2022 • edited Loading

oskarth commented Jan 13, 2023

jm-clius commented Jan 13, 2023

oskarth commented Jan 16, 2023

fryorcraken commented Nov 22, 2022 •

edited

Loading

fryorcraken commented Nov 22, 2022 •

edited by jm-clius

Loading

alrevuelta commented Nov 29, 2022 •

edited

Loading

Menduist commented Nov 29, 2022 •

edited

Loading

jm-clius commented Nov 29, 2022 •

edited

Loading

richard-ramos commented Nov 29, 2022 •

edited

Loading