Introduction to VIVA, part 5 : VIVA CAN? You bet!

in #vivacoin8 years ago (edited)


What's the solution to cloud blackouts and censorship? It's all in the CAN!

My good friend @thecryptofiend made a very interesting post today that I thought I would highlight and bring to everyone's attention so I answered it and resteemed the post.

If you read my comments there you'll notice that I purposefully avoided mentioning any sort of VIVA based answer, because I wanted to highlight what's here now.

But I thought I would take this opportunity to share with you the VIVA based answer, drawn from our own whitepaper, because this is coming in the next few weeks and it would have prevented the problems inherent in centralized content distribution networks.

But what actually is the problem here?
AWS has an outage, so what? How does that effect you?

Well in this particular instance AWS S3 is/was being used by a large hunk of the internet as a file store.
When it went down, so did half the internet, because so much of the internet was relying upon them to have it.

Basically, people were taking for granted the fact that Amazon Web Services are offered solely on a best effort basis and because of that, they had no fallback plan when AWS started returning errors. In fact had AWS just disappeared for whatever reason, there's a very good chance that much of the internet just could not recover.

There's a very good chance that if you have a file backup solution running, that you're backing up to the same S3 service that just went out. You might want to check your data integrity, because there are no guarantees data wasn't lost or corrupted during this event.

So what happens if your family photos disappear forever?

The problem here isn't amazon, a dozen competitors can and do popup and it still doesn't really solve the problem.

So what is the problem?

It's centralization. Too much content is consolidating in too few hands.

The reason for this is simple. Amazon and these other CDNs have vast amounts of resources sitting idle. These are resources that they can sell cheap and still make amazing amounts of profit on. This race to the bottom has caused massive consolidation around Amazon, Akamai, Cloudflare and a handful of others. You sitting at home on your computer could never even begin to compete.

It also isn't any good if your backup solution is cloud based, because you're at the mercy of a company. A company with shareholders and a company that must turn a profit and must comply with the laws of whatever draconian regime happens to be in place in whatever countries they decide to operate in.

Even if you discount the unlikely event that Amazon went completely out of business, all these services are still massively centralized. If you have something that may be offensive or god forbid copyrighted, they can censor you instantly and you'll lose your content, period.

But what about other solutions?

If you take a good look around, you'll find plenty of potential contenders that believe that they can solve this problem.

StorJ, IPFS, zeronet all have very good ideas but they suffer from some serious flaws as well.

StorJ - Requires a monthly contract to be paid or your data disappears. Not only no longer accessible to you, but just deleted from the network.
IPFS - Speaks a protocol that is a mix of bitorrent and git, they do not truly use the world wide web, and you either need a special client app running in the background to interact with it.
zeronet is like IPFS, but focuses on sites and adds a nice layer on top that handles domain name resolution within the .bit namespace handled by namecoin. The big problem with zeronet though is it's solidly on namecoin and thus you need to register a custom .bit domain through namecoin and .bit isn't a valid Top Level Domain (TLD). Again, you need to have a custom application installed to use it at all.
Maidsafe? Who knows? It changes every few months.

This list gets really long, and they're all really nice tries. But the thing is they miss the most important aspect of solving the long term problems inherent in data storage.

If you want to defeat cloud based webservices, you need to be able to interface with the world wide web like they do.
That isn't possible if people need to download, install & configure some custom app and ensure it's running in the background all the time.

There just isn't a good way to distribute content if it's not coming straight off a valid top level domain.

So what does the best solution look like here?

How about a globally distributed peer to content caching network?

How would something like that work without installing a custom app?

With a simple browser extension of course!

Isn't this the same as installing an app?
No, because your website is still hosted on the world wide web at any address you want.
Users don't have to install to access it, but you can incentivize them to use the plugin because it makes money for them and for you.

With the VIVA Content Addressable Network or VIVA (CAN) plugin, every visitor to the site can seamlessly share their local cache of your content.

As the page loads, a hook fires in the browser. The URL of each resource (images, javascripts etc) is hashed and the VIVA network is queried for a "live version" of the resource.

If the resource is not found, then the plugin will download the content from the website, hash the URL, hash the content and upload it to the VIVA network.

How is this possible if I'm not running a server app on my machine?

This is the point where everyone else falls down, but the answer is so simple, you're going to be banging your head on your desk.

We use a websocket connection for peer discovery, but we serve all content, peer to peer directly over WebRTC.

An example of just how easy that is to accomplish is given here...
https://www.html5rocks.com/en/tutorials/webrtc/datachannels/

Ok so that's a good gloss, but what about some more details?

When the plugin first fires up, it connects via a websocket connection to mint(s) chosen by the end user.
It then broadcasts a message that announces "Hey I'm here!" along with a list of data_hashes it has on hand and a list of data_hashes it's still looking for if any.

{
message_type: 'HELLO',
data_discovered: [array of hashes],
data_seeking: [array of DRCs]
}

This message is broadcast over the websocket to all connected clients, and then also relayed Peer to Peer via WebRTC
Each client gets that message and keeps an index of who has what hashes and also what clients relayed the welcome message to it the most quickly.

This allows every client to find nearby nodes that contain content they're looking for.

Doesn't this reveal the URLs of the sites I'm visiting?
No, it doesn't.

VIVA is a content addressable network where the hash of each URL is used as an additional index in addition to the data hash.

What is to prevent someone from faking the data, i.e. hashing some random data, but attaching it to legitimate URL?

When new content is discovered, the upload process, does NOT actually upload the data.
Instead the URL is hashed, the content is downloaded from it's original source and a hash of the content taken.
This goes into a "content claim manifest" CCM, which looks like this...

{
index_type: 'CCM'
urlhash: SHA256 of URL
datahash: SHA256 of DATA
discovered: timestamp (now)
expires: content_expiry date
discoveredby: viva account name
signature: signature of the VIVA account holder
}

This information is all that's known. The actual URL is NEVER stored.
A CCM is important information, but it's low value and useless for faking or tracking.

So why do it this way?
It's a way of announcing to the network that we have discovered content available at particular location in the graph.
It enables rapid indexing. The URL hash isn't anything more than an additional attribute in the graph search.

Other nodes in the network can then request the raw data and perform their own hash on the data to validate that the content is a match.
In the meantime, other nodes can check their own cache of the URL hash (if they have it already), and see if the hash is a match, if it's not, they can re-request the resource from it's original source and validate that the hash is new or not, i.e. still valid or not.

If it is found that the new datahash supersedes the old datahash, then the validating node gives what amounts to an upvote on the content, thereby lending their weight to the urlhash in the global index. If it's found that a node is misbehaving and uploading junk, other nodes can silence it by downvoting it in the index. Thus when a node requests a given URL hash, the results returned are ordered by the combined weight that other nodes which have seen that URL, have lent to the representative data hash. We call this a popularity index.

At this point the client should have multiple copies of the data sitting in their cache and they can lend their own weight, by upvoting the correct one.

Now as for requesting content, that comes through a "data request contract" (DRC), which looks like this...

{
index_type: 'DRC',
request_hash: ANY,
contracts : [{
contract data
}]
}

A contract is an offer for payment, and you'll notice that there is no mention of who's paying for it.
In VIVA all smart contracts are JSON objects and anonymous by default.
What they do contain is a signature field.
Public keys are registered with mints. Each public key has a spending limit provisioned by the owner of the key, but this is not public information.

When a node sees a CRM it is interested in claiming, all it needs to do is to complete the contract.
The default contract is called DATA_HASH_OF and it's fullfilled by stapling data that hashes to the request_hash, and submitting it their mint.

The mint first validates that the datahash is a match. If it is, then it checks it's publickey storage for a publickey that matches the signature on any of the contracts. If it finds a match then it debits the publickey and makes a credit to the account of the fullfiller node.

If it doesn't find a signature match, then it blinds the DRC, (deleting the content), places it's own stamp on the contract (certifying it has the matching data) and begins forwarding it to other mints for their signatures.

Once a contract has been settled, then each requester with a valid signature is sent the raw data, those nodes can begin to relay the information back upstream.

In most cases this is going to be a 1:1 mapping of data to signatures, however content that has been requested, but not found, can circulate for quite a bit, gaining additional contracts with additional signatures and thus getting more valuable. This can happen if it's been a long time since anyone on the network has seen data matching that hash.

So what does a contract for content look like?

{
contract_type: "C4C",
contract_operands: ['DATA_HASH_OF'],
data_hash: hash of requested chunk,
reference_hash: parent DRC,
expires: some date in the future>,
amount: some quantity of VIVA, defaults to 0.01VIVA,
signature: signature data
}

So in this regard a Data Request Contract is really an array of contracts, and the longer it circulates, the more valuable it becomes.

This is all well and good, but aren't we just moving the centralization to mints?

No, we aren't.
The data is still served by peers. Any node has the option of immediately responding with the correct data and claiming the contract at their leisure.

By being the first with the correct data, they have a legitimate claim to the funds of any contracts that are appended to the request, they've broadcast it to the whole network, and other nodes are signing off that they've seen it.

So they can freely submit the data directly upstream to their nearest nodes at the same time they are submitting the claim to their mint.
If a contract is claimed, it is ultimately the responsibility of the mint that executes the contract, to ensure that the data begins circulating correctly.

The mint will do this within 30 seconds regardless of if the contract claim has fully executed, i.e. all signatures found. It is optimistic execution and this negates any reason that the peer node would have for failing to submit the data.

Mints do not get paid for serving the data, but they must keep the data actively circulating for a minimum of 24hrs.

The final question then becomes if everyone is connected to a mint, then don't we have a traditional client / server architecture? What happens if the mint goes offline?
The client / server architecture only applied to payment processing and initial peer discovery. The data exchange aspects are 100% peer to peer, the mint is merely a datasource of last resort, if for instance the client is firewalled.

As mentioned before, all mints are required to keep content for a minimum of 24hrs.
However special "data storage nodes" exist to specifically archive content, potentially forever.
The nodes are registered with the mint and may or may not be owned by the mint as an extra revenue source.
These are a data source of last resort, because those nodes will wait up until the last second of the contract to supply information, or until the content age & content size vs fee has moved to a point that it becomes worthwhile to respond. In the meantime, their primary purpose is to rapidly slurp as much data as they can.

How do we know if data is really fresh?

If you right click on any website and do a view source, you're going to notice "meta" tags.
These meta tags are placed on the site by the owners of the site in order to among other things, tell search engines how long to cache the content.

Some examples are below...

meta http-equiv="cache-control" content="max-age=0"
meta http-equiv="cache-control" content="no-cache"
meta http-equiv="expires" content="0"
meta http-equiv="expires" content="Tue, 01 Jan 1980 1:00:00 GMT"
meta http-equiv="pragma" content="no-cache"

So we examine the meta-tags and compare it to the cache age of the content received from peers. If the content has expired, then we fill out a new data claim manifest.

Why do we even have data claim manifests?

DCMs are stored in the blockchain and kept indefinitely or until expiration. Mint owners set a fee split with DCM creators when they are trying to attract content hosts, and when data is requested and fulfilled that was subject to DCM, then the initial finder of the data gets their cut,regardless of if they were the ones that served the data on that request or not. In this way, we reward content discovery, but only at the time it is being requested. It also helps to avoid over duplication.
We want several copies of every piece of data circulating at all times, but it needs to strike a fine balance.

Something like bootstrap would be in the user's local cache, but if the CDN serving it were offline, the user could request it for nearly free.
Whereas, someone's youtube upload of their child's birthday party isn't likely to be requested very often and if for example youtube took it down, because the happy birthday song is copyrighted, the network would still have a cached copy of the video and while it might cost a dollar or two to revive it, at least it would be possible to do so, which is something not presently possible.

What if two people submit identical DCMs but with different datahashes?
The URL hash is merely an index, all data stored in the VIVA CAN is returned for each URL hash, automatically sorted by popularity.

Thus if there is a collision (which can happen because data does change), the network returns the most popular result first.
It is up to the mint to correctly maintain the popularity index, invalidating at expiration and a mint that doesn't do this properly is going to find themselves quickly out of business.

Does this work for only web accessible content? What about my cat pictures?
You have a private space within VIVA where you can upload whatever you want and you can mark it public, private or paid.

If it's marked private it's AES encrypted and only you have the key, but the key can be regenerated because it's deterministic.

If it's marked public then it's marked public then the entire world can see it.
If it's marked "paid", then it's still encrypted, but it's encrypted with a mint key and a shared key.
The mint will only release the key upon execution of a contract for the key, which generally means someone paid the mint according to terms you set.
In this case, the key is sent ONLY to the direct requester.

For any of this content you can set a viva link which looks like any other URL
viva://username@mint/filename or filehash
or simply
viva://hash

Again the neat thing about all of this is there's nothing external to download and run.
You just install the upcoming viva plugin in your browser, and you're good to go!

I hope this has gotten you excited about some of the features upcoming in VIVA.

Interested in learning more about VIVA?
Start with these links...
https://steemit.com/vivacoin/@williambanks/introduction-to-viva-a-price-stable-crypto-currency-with-basic-income-that-s-not-hypothetical
https://steemit.com/basicincome/@williambanks/introduction-to-viva-part-2-more-than-meets-the-eye
https://steemit.com/basicincome/@williambanks/introduction-to-viva-part-3-how-does-it-work
https://steemit.com/basicincome/@williambanks/introduction-to-viva-part-4-how-do-you-bootstrap-a-new-economy
This post is 100% steem powered!

Sort:  

I look forward to installing the VIVA plugin! I can see the value of distributed caching. Great work!

Thanks! We look forward to releasing it soon. Make sure to follow @vivacoin for all VIVA related announcements and me for descriptions of how it all works.

Congrats you have been selected as Author of the day by the Steemvoter (SV) Guild, keep up the good work and helping make Steem great!

Note: You should receive many guild votes in an hour or so, enjoy!

Wow! Thank you, I'm honored!

I think you're building the internet from my head universe. I have very little idea how it works so I'm glad other people do XD Keep going :D

Thanks so much! It's a lot of work even putting the protocols and whitepaper together, to speak nothing of the code to actually pull this off. It really helps me to keep going when people comment and say things like "keep going".

This is great. I look forward to trying the VIVA plugin

I'm looking forward to everyone giving it a try soonish too. I can't believe how things are finally coming together on this project. It's been an epic few years.

WOW! Amazing work, great plugin I hope you will release it soon. So I got it right, with VIVA you will be able to safe copies of websites, even if they get deleted from the internet or unreacheable?

Yes you can cache the internet as you surf if you want. It's important to note that it won't cache dynamically generated content , this is for safety concerns. You wouldn't want to log into your bank account and then have that info be the cache hit for the entire VIVA network. So there are some safeguards in place. But for static content, it's perfect.

Thank you, thats great work what you are doing.

Wow, this is quite a headfull(a term I might just have invented, I'm still waking up), but it seems quite groundbreaking. I'm now motivated to save for a new computer to contribute to the network. Great job, upvoted and resteemed as usual.

Thank you, it's very much appreciated. Your support on this has been awesome.

Interesting concept and I shall take my time digesting from part 1 ...I missed the first 4 posts. Do you perhaps know which S3 availability zone or region was down?

News was saying us-east 1 and us-west 1 depending on the source.

Thanks, I found it on Techcrunch. We have our Datalake in AWS ... but not in the US....thankfully this time.

You should consider moving it to VIVA when we launch. You don't pay for storage, just retrieval and it gets globally sharded and distributed.

It's 2.3 petabytes and grows in excess of 10 terabytes a day .... would this be feasible? I have not yet read about VIVA except today's post.

Yes it's designed to handle a use case exactly like that. The CAN as internet content cache is just an example of one way to use it.

But the thing is, the entire VIVA data store is using an index size of 256 bits, so a petabyte is really nothing, we're not running out of space there.

Now, the real question is do we have the capacity and can it be provided cheaply and reliably.

The MedicAxess data store which will run on top VIVA, must store any type of medical modality on demand and has retrieval guarantees of 30 mins or less for aged data and 5 mins or less for newer data.

With MedicAxess you're talking about the complete medical records of the entire nation of Mexico, (MedicAxess is a joint venture between Imaxess, IMSS, & VIVA), a nation that is quickly shifting to digital records across the board including 4k live streams of surgery and other telemedicine applications. All of which must be constantly archived. We projected 1 PB per month growth on that.

So 310 TB per month shouldn't be an issue. It will mostly depend on how many people contribute storage resources to the network, which will mostly be a function of how well we get our message out. But network wise, yes we can most definitely handle the capacity.

But I'll bet we can reduce your storage expenses an order of magnitude or more. Here's how this would work.

I just checked S3 pricing, looks like it's $24 per TB / per mo, not counting access charges. With VIVA it would be about $5.50 per TB per mo and that includes access charges.

That pricing assumes you're a Crown Holder, meaning you bought VIVA Crowns during one of our auctions, or during the ICO.

If you're not a Crown Holder, you'd need to buy VIVA on the open market to pay to access the data stored there, and the math on the back of my napkin here is saying that would be around $15/TB per mo.

Keep in mind, you only pay for retrieval of data, never storage. Your data lives on the network indefinitely, but the longer you go without accessing it, the more expensive it becomes to access.

I would love to discuss this with you if you would like to go to https://chat.vivaco.in/ sometime tomorrow, I should be there all day.

hi william, i have a side question. Did springsteem 2017 take place yet?

No, unfortunately there were literally 0 takers on the pre-sale tickets which meant that we lost our chance to place a deposit on our location.

So we cancelled this year's event and will try again next year, but this time with sponsors in place and more solid footing.

Thanks for asking, I've been meaning to put out an update on it.

okay, good that i asked. i am looking for steemit-related events in usa this year and came accross springsteem. i believe you end up coming out strong when it eventually happens. Do you know of any other events happening in usa this year? I have heard of steemstock and summer steemit workshop!

Thanks for the compliment!

@alechahn puts on a festival in Tijuana called Imperfectu and they are accepting Steem Dollars this year if I recall correctly but I'm not sure. It's been awhile since I've paid much attention to the USA, but TJ is practically a suburb of San Diego.

I'm an American ex-pat living in Mexico and our entire business revolves around bringing crypto currencies to Latin America. I've had my head down so much I haven't thought to look up and look around. But I'll ping you if I find anything.

Easiest way to chat with me is to come over to https://chat.vivaco.in I pretty much live there 24/7.

okay, i have been trying to learn about viva. @deanliu commented it to be good stuff. in my case though, i still new to cryptostuff. steemit opened my eyes more to it. since steemit, i have different and lost one pc in the process. i am reading about viva currently. i hope to be there at the beginning. i seem to be always late since i have start cryto or blockchain enthusiasm. i will visit you in chat soon! thank you for your effort here.

You're very welcome and I hope to see you soon!

So here's a question - how could the CAN system be facilitated if the ISP itself is in on the game?

Relevant to me as I own and operate an independent ISP, and am 'still' hunting for the right platform to build the DAOWISP project. The ISP game is going 'local' due to tech advancements, and I think there's a huge potential for cooperation there for all the crypto-projects that's being overlooked.

I'm also working with Fort Galt ( @piedpiper ) and the VIVA system looks very promising for a few businesses & projects there!