Yesterday I posted part 1 of this series where I started off by syncing a bitcoin core bitcoin node and extracting a large event log from the node through the local JSON-RPC API. As I start writing this post, the extraction script is still running, the last events extracted are from Jul 2021, the event log file is almost 200 GB in size, containing about two billion event lines. In a few hours the dump should be complete and I'll be able to run the second script that I'm going to be writing in this blog post.
Before we do that, lets revisit the format of the event log that we are currently creating.
* 1NRy9s4vEFSVgNKJAoj3uMg5EjMvJrVBC 2015-03-20T17:36:11 348439 449 spent: address 0.022443
* 1NRy9s4vEFSVgNKJAoj3uMg5EjMvJrVBC 2015-03-20T17:36:11 348439 449 spent: address 0.0225
* 1NRy9s4vEFSVgNKJAoj3uMg5EjMvJrVBC 2015-03-20T17:36:11 348439 449 spent: address 7e-05
* 1NRy9s4vEFSVgNKJAoj3uMg5EjMvJrVBC 2015-03-20T17:36:11 348439 449 spent: address 0.03958
* 19P5Xur6HrDboK2QFYTzz7mk9fCaz89x17 2015-03-20T17:36:11 348439 449 recv: address 0.05
* 1NRy9s4vEFSVgNKJAoj3uMg5EjMvJrVBC 2015-03-20T17:36:11 348439 449 recv: address 0.034393
* 1NRy9s4vEFSVgNKJAoj3uMg5EjMvJrVBC 2015-03-20T17:36:11 348439 450 spent: address 7e-05
* 1NRy9s4vEFSVgNKJAoj3uMg5EjMvJrVBC 2015-03-20T17:36:11 348439 450 spent: address 0.034393
* 1NRy9s4vEFSVgNKJAoj3uMg5EjMvJrVBC 2015-03-20T17:36:11 348439 450 spent: address 0.01028
* 1NRy9s4vEFSVgNKJAoj3uMg5EjMvJrVBC 2015-03-20T17:36:11 348439 450 spent: address 0.0182
* 19P5Xur6HrDboK2QFYTzz7mk9fCaz89x17 2015-03-20T17:36:11 348439 450 recv: address 0.062743
* 19fYAQMWrTn4wrz621c7XRJmZDUsiPrM2h 2015-03-20T17:36:11 348439 451 spent: address 0.00364538
* 19fYAQMWrTn4wrz621c7XRJmZDUsiPrM2h 2015-03-20T17:36:11 348439 451 spent: address 0.00364517
* 1BdCZJNdAvmZudqsJxP1Uw3UcvFmQFbb9f 2015-03-20T17:36:11 348439 451 recv: address 0.00384187
* 12begeBfqG7Agqk22Wpg4sXomZeXTG463t 2015-03-20T17:36:11 348439 451 recv: address 0.00334868
* 19fYAQMWrTn4wrz621c7XRJmZDUsiPrM2h 2015-03-20T17:36:11 348439 452 spent: address 0.00364622
* 1GS8Wnark2aABZdijp2R7cTJvdErSDksS3 2015-03-20T17:36:11 348439 452 spent: address 0.00379273
* 13TurioKTknCUFXArcQGmHVfVWeqWVsQmN 2015-03-20T17:36:11 348439 452 recv: address 0.00372619
* 1KLiyWmVu3RnsfMCaUvPVGXAJwXoVViUdF 2015-03-20T17:36:11 348439 452 recv: address 0.00361276
The format after the first asterix consists of:
- The bitcoin address of the signing key this event relates to
- The block timestamp
- The height the block is at
- The index of the transaction within the block
- The event type (spent: or recv:)
- The way the signing key was adressed in the event (address or pubkey)
- The value represented in this event
Unique adresses
In 2022 it was reported that bitcoin had used a total of 1 billion unique adresses, so we should expect to be processing a little bit more than that. We are going to be maintaining a a dictionary with basic adress state, and even if that state is kept to a minimum, we have the address (34 characters plus 41 characters python string overhead), two dates at least (19 characters each plus 41 python string overhead each) and a floating point number (24 bytes) to work with.
#!/usr/bin/python
import sys
a = float("123.456")
b = "2015-03-20T17:36:11"
c = "2015-03-21T14:26:33"
statesize = sys.getsizeof(a) + sys.getsizeof(b) + sys.getsizeof(c)
d = [a, b, c]
listsize = sys.getsizeof(d)
emptylistsize = sys.getsizeof([])
e = "19P5Xur6HrDboK2QFYTzz7mk9fCaz89x17"
keysize = sys.getsizeof(e)
f = {}
f[e]=d
dictsize = sys.getsizeof(f)
emptydictsize = sys.getsizeof({})
print("statesize", statesize)
print("listsize", listsize)
print("emptylist", emptylistsize)
print("keysize", keysize)
print("dictsize", dictsize)
print("emptydictsize", emptydictsize)
The results
statesize 144
listsize 80
emptylist 56
keysize 75
dictsize 184
emptydictsize 64
It seems clear that the sizes reported by sys.getsizeof cant be deep sizes. While the dict probably contains at least the key, if we are pesimistic about the rest, each uniqu address could take up as much as statesize + listsize + dictsize or 408 bytes, so at more than 1 billion of adresses that would be 408 GB of memory we would be needing. My system doesn't have that much memory, so we need to chop up our work load a bit to keep memory requirements in check.
If we realize that the base58 bitcoin adresses, except for the first character, have 58 possible values, then we can divide processing of the event log into 58 runs, each run only processing the events for what the second cahracter of the address has a given character value.
This way each run will require something around 7 GB of RAM to run in, what is still quite a lot, but well within the bounds of what my system has.
Base running structure
#!/usr/bin/python3
import math
from datetime import datetime
exposed_uso_val = 0.0
unexposed_uso_val = 0.0
start_time = datetime.now()
counts = {}
for secondchar in "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz":
print("Run 1" + secondchar)
trackers = {}
with open("output.log") as logfile:
for line in logfile:
if "\n" in line and line[3] == secondchar:
...
print("Tracked", len(trackers), "used addresses")
for key,val in trackers.items():
...
print("Exposed USO value so far :", exposed_uso_val)
print("Unexposed USO value so far :", unexposed_uso_val)
print(datetime.now() - start_time)
print("Total exposed USO value :", exposed_uso_val)
print("Total unexposed USO value :", unexposed_uso_val)
...
I've cut out two core and a trailer part parts from the script to make the general stucture more clear, we will get to the missing pieces in a minute. In the abouve code we see the big for loop that runs the core functionality 58 distinct times. Each of the 58 runs the event file is opened and processed, but all lines not matching the second character are skipped. Each run starts of with an empty trackers dict. This is the dict that will take up a lot of RAM. After the trackers dict is filled, the second part of the core code itterates over all trackers items.
Per adress tracking.
if "\n" in line and line[3] == secondchar:
_, addr, tim, height, txno, event, typ, amount = line.split(" ")
spent = event == "spent:"
amount = float(amount)
old_amount, old_exposure_time, old_active_time = trackers.get(addr, [0.0, None, None])
if spent:
new_amount = old_amount - amount
if new_amount < 0.0:
new_amount = 0.0
exposure_time = old_exposure_time or tim
trackers[addr] = [new_amount, exposure_time, tim]
else:
new_amount = old_amount + amount
trackers[addr] = [new_amount, old_exposure_time, tim]
The split command spilt an event log line into its relevant parts. Each adress gets three fields of state:
- The current amount of bitcoin bound to the address.
- The time that the public key was first exposed on the chain through the spending of an USO
- The time of the last participation of this adress in a transaction
There are two types of events in the event log. Receive events and spent events. On a spent event the amount decreases while on receive events it increases. Both event types will change the activity field in the state, but only the first spent event will set the exposure time to a non Null value that it will keep untill the end of the run.
Filtering the state
for key,val in trackers.items():
amount, exposure_time, active_time = val
if exposure_time is None:
unexposed_uso_val += amount
else:
exposed_uso_val += val[0]
if not exposure_time is None and val[0] >= 0.01:
layear = int(val[2].split("-")[0])
if layear < 2020:
print("*", int(100000*val[0])/100000, val[1], val[2], key)
ckey = str(math.pow(10, math.floor(math.log(val[0])/math.log(10)))) + "-" + str(layear)
if ckey not in counts:
counts[ckey] = [0, 0.0]
counts[ckey][0] += 1
counts[ckey][1] += val[0]
This code does three things. It keeps track of the total (cross all 58 runs) USO value on chain in two distinct buckets:
- USOs bound to unexposed adresses
- USOs bound to exposed addresses
Please note that the event log only contains P2PKH adresses, no SegWit or Taproot.
The second thing it does is that it looks for state of adresses that are:
- Exposed through a spent event
- Contain at least 0.01 bitcoin, curently worth about $1,080
- Have been dorment since at least 2020
And finaly, the thirs thing it does is that it keeps track of the number of exposed adresses and their total funds per year of last activity and range of total balance.
Some trailer code
for myear in range(2009,2026):
print(myear)
for log in range(7, -7, -1):
ckey = str(math.pow(10,log)) + "-" + str(myear)
if ckey in counts:
val = counts[ckey]
rng1 = float(math.pow(10,log))
rng2 = rng1 * 10
print("+", rng1, "..", rng2, ":", val[1], "held in", val[0], "adresses")
The above trailing code produces an unsorted report about some basic stats ot he vulnerable adresses. It processes thye counts struct created in the previous code and writes a little report.
Output
The output of step two consists of two main parts. The first part is a list of many of hundreds of thousands of lines representing exposed adresses that have been inactive since at least 2020 and that have a balance of at least 0.01 bitcoin. Here is a short sample:
* 0.5 2010-10-28T00:38:36 2010-10-28T00:38:36 112B8k6JRuPhTgWwVbfYB1uVvgH3bZDorY
* 0.01 2011-03-24T18:06:51 2011-03-24T18:06:51 114WxDkkxakjXJztRgtHQjvsBj3JHukX9Z
* 0.02 2011-04-28T22:39:42 2011-04-29T19:10:56 11zJ4N6DVBtGMJk4n1Kf1LhUrBZ8XHUcx
* 0.70011 2011-05-13T02:22:59 2011-08-17T16:31:40 113Pev2GjjB6qtVsnhkqBwbGnne32UWa5h
* 0.18 2011-05-23T22:21:04 2013-12-12T11:09:15 115N34vq6fkMBmWHBUqmxDLRRJB19GVaAe
* 8.16074 2011-06-12T05:13:53 2011-09-28T10:18:55 11pkP8WevFseavrDBiN5j4YuyZzbJDVZ3
* 7.51 2011-06-06T15:43:41 2011-06-06T15:43:41 113gXkmwZPRv5LKPzfRsdaTmyVNGZdwosH
* 0.19999 2011-06-08T08:54:54 2011-07-27T04:00:57 11vsayPsAuwZVUq1jA1qVRmXowG4XWQNK
* 0.18 2011-06-10T08:54:19 2011-09-06T14:50:53 115EETqfhjLLHF1R8zHF9EC9duvrpiWHsP
* 0.8942 2011-07-06T10:39:12 2013-04-27T00:19:27 114NPfihghsADEZeYQgXgSHFmHUvsvXXdo
* 0.01854 2011-06-28T12:26:13 2013-09-17T02:36:18 114GMunZKebrumXs873eDuMQmNaT8Px6Mv
* 2.0 2011-06-19T04:01:34 2011-10-12T13:27:48 113UJA4rwDJ52RLoxGsr9nqV1XKpQQJBLZ
* 4.59999 2011-07-10T15:38:20 2011-10-22T10:42:48 114gmo4oKTx1qXTjTCVGHaZCEVqMeZpAPD
* 1.02 2013-02-24T00:19:04 2013-02-24T00:19:04 115wsiDt5STiwWrvH6JXAF2mMqQhRUejof
* 0.02261 2011-07-08T17:13:24 2011-08-20T01:17:01 114EmA6SfURASn3nYFdfpBw4WuZ5Lj4jZc
* 0.04238 2014-02-19T22:36:19 2014-02-19T22:36:19 114r65EcB2GddLLc5Ya3m9Weo7YvXcXrBg
* 0.051 2011-07-24T17:48:19 2014-09-06T15:35:37 114GZRHRFYSGc6geG2vXsD9kzRX8r2HKvM
* 4.68617 2011-07-29T20:35:18 2011-10-12T10:18:09 113GNMJCUJXjohatQ6JPavQVht3jZCTsN8
* 0.26 2011-09-07T01:30:38 2014-02-08T03:28:50 113accejbLubaL2cspBaQnoqN5x1KUoqYH
* 0.02999 2013-03-27T07:11:34 2017-11-19T09:14:51 112LWKR3ATmRYE4VoXWHTVtvYsShfkeDED
Each line starts off with the balance followed by two moments in time and followed by an address. The first moment in time is the moment when the public key was frst exposed on the blockchain. The second moment is the moment the address was last part of any transaction. Note that these moments can be the same or they can be years apart.
When we go on to our third step for our Bitcoin Quantum Kanarie, we are going to use these lines, as these are the adresses we want to monitor for new transactions that comultaive might indicate a large scale quantum blockchain heist taking place.
The second thing our script outputs is a nice overview of the data, not just the data on exposed adresses within these constraints of balance and inactivity, but the data on all USOs bound to exposed public keys. The script is currenly still running and will take about 10 more hours to complete. What is more, the script from my previous post missed out on Segwit and Taproot adresses. I'm going to have to go back and run that script once more for Segwit and Taproot, then run today's script once more on those results.
When that is done, I'll do another blog post here discussing the two overviews.
Some big ones so far
Just to get an idea, these are some big ones we found some far that would pe prime candidates for a criminal with a huge quantum computer.
* 100.0 2010-12-09T21:58:32 2010-12-09T21:58:32 12spqcvLTFhL38oNJDDLfW1GpFGxLdaLCL
* 123.77385 2013-01-22T19:13:36 2013-11-23T01:52:32 132HTmabmeB1SKcsJ4Y7M4RTz8Vu9ADUzH
* 109.0001 2016-11-16T02:28:32 2016-11-20T01:19:25 15SYyLY7KFa7HwYpmutTV92nbmAsUML4Kb
* 109.96116 2019-09-18T17:49:20 2019-09-23T16:36:00 17yMQg1MXk4V1f4PugjfqFuuoTcpmCZ9Lk
* 100.0 2013-02-18T20:48:21 2013-08-28T20:31:42 1D3ksSz2RfkbPPiPy9F7pkeD2MAAgvvtvF
* 245.89133 2019-06-28T16:50:23 2019-06-28T16:50:23 1DY1v67dpgcKeg9YRXd1V965b9Fm6GoThe
* 123.34005 2014-02-20T06:43:52 2014-10-27T20:42:29 1FHzuXhUPEvyhxiqhm5PVDjNNJwsHHCnMC
* 100.0 2018-03-23T10:52:07 2018-03-26T15:51:31 1Hx3HR7owoUjESNnvURt1pWUJ7429okd5L
Let's look at the bigest one.
I think this illustrates a huge part of the problem. While the exposing transaction doesn't use the source address as return adress, the other USOs bound to this adress are still exposed because the address was reused as a receive adress multiple times, so all USOs should have been added to the outgoing transaction here, though the real problem is reuse of adresses.
In the next post we are going to look at the complete overview of exposed QC vulnerable funds on the bitcoin blockchain. After that we can make a start at an actual kanarie bot.