Three-player mixed equilibrium

in #gametheory6 years ago

You probably saw this one coming. Can we take the approach at https://steemit.com/gametheory/@markgritter/two-player-mixed-equilibrium-in-the-curation-game and scale it up to three players?

Let's first calculate the payoffs for three equal-sized upvotes:

first weight = sqrt(1) - sqrt(0) = 1.000000
second weight = sqrt(2) - sqrt(1) = 0.414213
third weight = sqrt(3) - sqrt(2) = 0.317837

first payoff = 0.577350
second payoff = 0.239146
third payoff = 0.183504

A strategy will now be a pair of numbers (X,Y): upvote in minute X if nobody else has already voted. If one player has voted, wait for minute Y. It turns out that X < Y in practice but I didn't think this was necessarily the case; it might be that "panic voting" as soon as somebody else does was a viable strategy.

This gives 15 * 15 pure strategies which is a lot but not too many. The fictitious play can proceed, although slowly. However, once we have identified which strategies actually get used, we can eliminate the rest and substantially speed up the computation. (I didn't check whether these strategies are strictly dominated or just not part of the optimal mix; probably at least some of them are dominated.)

Results

After 10,000 iterations on the reduced strategy set:

Final strategy:
0.000007 (5, 12)
0.000007 (5, 13)
0.000407 (9, 14)
0.003907 (8, 14)
0.008206 (6, 14)
0.008506 (7, 14)
0.012706 (9, 13)
0.062701 (8, 12)
0.079099 (5, 14)
0.106496 (8, 13)
0.121695 (6, 12)
0.125895 (7, 12)
0.230984 (6, 13)
0.239383 (7, 13)
Game value=0.183727051384

On the final iteration, the difference between the mixed strategy and the best pure response was about 0.000002:

v=0.183729 p=0.183731 n=10000, max_s=(6, 12)

(We could now run this strategy against all the pure strategies we left out, just to make sure, but I didn't bother.)

As expected, the game-theoretic optimal is just barely better than always accepting third place. The most frequently-used pure strategies try to upvote in minute 6 or 7, and if they are beaten, then upvote in minute 12 or 13.

Does this tell us anything about real curation strategy? Probably not, because as noted before the population of people upvoting is unknown, and many upvoters will not be strategic. I do find it interesting that the optimal strategy is to vote quite early, though, rather than waiting for minute 10+. Like a poker tournament, the gap between first and second and third is too big to not play aggressively.

Code

# Pure strategies for three-player game:
# S_ij = upvote at minute i, if no upvote already exists,
#        upvote at minute j, if one player already voted,
#        otherwise wait for 15 minutes

# Three equal-sized votes:
# First  weight = sqrt(1) - sqrt(0) = 1.000000
# Second weight = sqrt(2) - sqrt(1) = 0.414213
# Thrird weight = sqrt(3) - sqrt(2) = 0.317837
#
firstPayoff = 0.577350
secondPayoff = 0.239146
thirdPayoff = 0.183504

# Pure srategies are named by pairs (i,j) and mixed strategies
# implemented as dictionaries.

def payout( sA, sB, sC ):
    # Payout is from A's perspective
    (ai, aj) = sA
    (bi, bj) = sB
    (ci, cj) = sC
    
    if ai < bi and ai < ci:
        # A goes first
        return firstPayoff * ai / 15
    if ai < bi and ai == ci or ai < ci and ai == bi:
        # A ties for first
        return ( firstPayoff + secondPayoff ) * 0.5 * ai / 15
    if ai == bi and bi == ci:
        # Three-way tie for first
        return (1.0 / 3.0 ) * ai / 15

    # A is not first
    # Is there a tie?
    if bi == ci:
        return thirdPayoff

    # Swap so B is first
    if ci < bi:
        (ci, cj) = sB
        (bi, bj) = sC

    # Who wins second?
    # is there a tie?
    if aj == cj:
        return ( secondPayoff + thirdPayoff ) * 0.5 * aj / 15
    if aj < cj:
        return secondPayoff * aj / 15
    if aj > cj:
        return thirdPayoff
    
    assert False, "Should have covered all possilibities"

#_pureStrategies = [ (i,j) for i in xrange( 1, 16 ) for j in xrange( 1, 16 ) ]
#v=0.183712 p=0.184002 n=281, max_s=(7, 12)
#v=0.183728 p=0.183967 n=282, max_s=(6, 13)
#v=0.183656 p=0.183915 n=283, max_s=(7, 13)
#v=0.183687 p=0.183891 n=284, max_s=(6, 13)
#v=0.183615 p=0.183902 n=285, max_s=(5, 12)
#0.003512 (5, 14)
#0.003512 (8, 14)
#0.003512 (9, 14)
#0.007009 (6, 14)
#0.007009 (7, 14)
#0.010505 (9, 13)
#0.027988 (5, 12)
#0.048967 (5, 13)
#0.059456 (8, 12)
#0.108407 (8, 13)
#0.122393 (6, 12)
#0.125890 (7, 12)
#0.230785 (6, 13)
#0.237778 (7, 13)

_pureStrategies = [ (5,14), (8,14), (9,14), (6,14), (7,14), (9,13), (5,12), (5,13), (8,12), (8,13), (6,12), (7,12), (6,13), (7,13) ]

def pureStrategies():
    return _pureStrategies

payoutTensor = {}

def cachedPayout( sA, sB, sC ):
    key = (sA, sB, sC)
    val = payoutTensor.get( key, None )
    
    if val is None:
        key = (sA, sC, sB)
        val = payoutTensor.get( key, None )
        
    if val is None:        
        val = payout( sA, sB, sC )
        payoutTensor[key] = val
        
    return val

import itertools

def purePayout( a, mixedBC ):
    value = 0.0
    for (b,c) in itertools.combinations_with_replacement( pureStrategies(), 2 ):
        if b == c:
            value += mixedBC[b] * mixedBC[b] * cachedPayout( a, b, b )
        else:
            # (b,c) and (c,b) are separate possibilities but they have the
            # same probability and payout
            value += 2.0 * mixedBC[b] * mixedBC[c] * cachedPayout( a, b, c )

    return value

def mixedPayoutAndOptimalPure( mixedA, mixedBC ):
    maxPayout = -1.0
    maxStrategy = None
    value = 0.0
    for a in pureStrategies():
        unweighted = purePayout( a, mixedBC )
        if unweighted > maxPayout:
            maxStrategy = a
            maxPayout = unweighted
        value += mixedA[a] * unweighted
    return ( value, maxStrategy, maxPayout )
    
def initialStrategy():
    ss = pureStrategies()
    numStrategies = len( ss )
    return dict( (s, 1.0 / numStrategies ) for s in ss )

def weightPureStrategy( s, i, weight ):
    dw = 1.0 - weight
    newWeights = dict( ( k, dw * x ) for (k, x ) in s.iteritems() )
    newWeights[i] += weight
    return newWeights

def strategyRep( s ):
    return "[" + " ".join( "{:.6f}".format( s[i] ) for i in pureStrategies() ) + "]"

def printFullStrategy( s ):
    allPure = [ (v,p) for (p,v) in s.iteritems() ]
    allPure.sort()
    for (v,p) in allPure:
        print "{:.6f} {}".format( v, p )

def fictitousPlay( numIterations ):
    s = initialStrategy()
    try:
        for n in xrange( 1, numIterations + 1 ):
            ( val, maxStrategy, maxPayout ) = mixedPayoutAndOptimalPure( s, s )
            print "v={:.6f} p={:.6f} n={}, max_s={}".format( val, maxPayout, n, maxStrategy )
            if n % 100 == 0:
                print strategyRep( s )
            adjustment = 1.0 / ( n + 1.0 )
            s = weightPureStrategy( s, maxStrategy, adjustment )
    except KeyboardInterrupt:
        pass
    
    print "Final strategy:"
    printFullStrategy( s )
    val, _, _ = mixedPayoutAndOptimalPure( s, s )
    print "Game value={}".format( val )
Sort:  

Very good analysis - this is the kind of thing I looked at when I started a curation strategy, and keep having to look at whenever the ecosystem is changed, such as HF20.

That is pretty fun.

I think somebody made an algorithm which tells you the best time to vote based on the previous post. Probably this give the most reliable result if someone has continuous support.

To reduce the complexity you could create a recursive game starting with voter 1 which pick an optimal time to vote, then voter 2 is added which picks an optimal time to while voter 1's voting time remains unchanged. Then voter 3 is added which picks an optimal time while voter 1 and 2 leave their voting time unchanged etc.

we love coding

Reply !stop to disable the comment. Thanks!Hello! Your post has been resteemed and upvoted by @ilovecoding because ! Keep up good work! Consider upvoting this comment to support the @ilovecoding and increase your future rewards! ^_^ Steem On!

I should probably stop using #programming until @ilovecoding fixes its bot not to upvote at minute 0, thereby burning the rewards.

Thanks, I think you still get the curation rewards and all others will return to the pool.

I get 75% of the value of the upvote, sure. But I'd rather that most of the 25% curation rewards go back to the people who upvoted me rather than the pool.