Reverse Engineering: Pokemon GO

Datetime:2016-08-23 03:09:42          Topic: Golang           Share

While you're reading this, keep in mind that I'm available for hire! If you've got a JavaScript project getting out of hand, or a Golang program that's more "stop" than "go," feel free to get in touch with me. I might be able to help you. You canfind my resume here.

You’ve probably heard of Pokemon GO by now. It’s become incredibly popular, very quickly. It’s currently making over a million dollars per day from in-app purchases for Nintendo and Niantic, and caused Nintendo’s stock price to rocket upwards in the weeks following the launch. Let’s talk about the internals a bit.

Pokemon GO is built using the Unity Game Engine , targeting Android and iOS. Most of the game assets are readily accessible, and include interesting things such as: alternative pokeball icons (including a master ball!), different items, and even a McDonalds logo. Some of the script data is also available, but I haven’t looked at that yet. I was more interested in the network protocol.

Communication between the Pokemon GO app and the backend servers is performed via HTTPS, but the app doesn’t do certificate pinning, so it’s trivial to MITM the data if you control the device. Requests and responses are encoded using Google’s Protocol Buffers , and seem to form a sort of RPC system.

There’s an initial handshake performed with pgorelease.nianticlabs.com/plfe , which seems to assign a channel for that session. This handshake includes your authentication token, your location, a timestamp, and some binary blobs. The blobs are very high-entropy, so it’s likely that they’re signatures or some other kind of encrypted data. Subsequent communications are performed with a URL contained in the handshake response, which currently takes the form of pgorelease.nianticlabs.com/plfe/nnn , where nnn is a number. These numbers were rather small (rarely above two digits) soon after release, and have been growing since. It’s my conjecture that they map to individual backend servers, and that the initial handshake serves as a sort of load balancing step.

As of right now, there are some pretty decent schema definitions for most of the protocol, but when I started messing with it there were none. I wrote a tool to help me interpret the protobuf data without a schema. I’ve named itprotofudger.

Let’s start with a bit of a crash course in protocol buffers’ encoding rules. The long, detailed version is available here .

Protobuf messages are made up of a series of key/value pairs. Keys are numeric, and values’ types are identified by a three-bit tag in the key. Numeric values are mostly “varint” types - this is a variable width integer encoding. Smaller value, less bits. There are two additional numeric types: 32-bit and 64-bit fixed-width. These can hold floating point or integer numbers of any sign. The spec says they’re supposed to be little-endian, but don’t count on it all the time. There’s a variable-length type as well, which can be used to hold strings, byte arrays, or embedded messages. Embedded messages are simply serialized with protobuf and stuffed into variable-length byte arrays.

Usually to correctly interpret a protobuf message, you need to know the schema. This is because the same protobuf type can represent several possible application types - e.g. a 32-bit fixed-width field can hold a floating point number, an unsigned integer, a signed integer, or even a timestamp. You can, however, correctly decode a protobuf message with no schema and display a rough approximation of its structure. This is because, by design, all the fields have known lengths, and enough type information to display a reasonable representation of them. You can do exactly this using the protobuf compiler, by doing protoc --decode_raw < message .

What protofudger does is similar to that process, except that it makes a bit more of an effort to figure out what the likely types are for numeric and variable-length values. It has some rough heuristics to identify floats, integers, signs, timestamps, and embedded messages. The rule to determine the most likely numeric type is to find the type where the value is closest to zero. This seems to yield very few incorrect results. Timestamps are detected as numbers between 1400000000000 and 1500000000000 , or 1400000000 and 1500000000 . These will need to be adjusted at some point in the future, or even now if you’re dealing with timestamps in the past. Again though, currently it seems to yield very few incorrect results. Embedded messages are detected by trying to parse variable-length fields as protobuf data. If it parses successfully, it’s displayed as such. Otherwise, it’s either text or a byte array. Text is just any byte array which is valid UTF-8 text.

protofudger is mostly useful for inspecting protobuf messages of unknown provenance and structure. It’s only meant to help you interpret the messages - it won’t get things perfect every time.

Let’s look at it working!

This is a protobuf message that was generated while I was playing Pokemon GO. It’s encoded with base64:

CAIYoYCAgLDr8rBdIu4BCGoS6QEKvQGAgICA1OKZ62qAgICA9OeZ62qAgICA/OeZ62qAgICAhO
iZ62qAgICAjOiZ62qAgICAlOiZ62qAgICAnOiZ62qAgICApOiZ62qAgICArOiZ62qAgICAvOiZ
62qAgICAhO2Z62qAgICAjO2Z62qAgICAlO2Z62qAgICAnO2Z62qAgICApO2Z62qAgICArO2Z62
qAgICAtO2Z62qAgICAvO2Z62qAgICAxO2Z62qAgICA7O2Z62qAgICA9O2Z62oSFQAAAAAAAAAA
AAAAAAAAAAAAAAAAABkAAAAgWelCwCEAAAAApxxiQCICCH4iCwgEEgcIuJbn594qIgMIgQEiLg
gFEioKKDRhMmU5YmMzMzBkYWU2MGU3Yjc0ZmM4NWI5ODg2OGFiNDcwMDgwMmUiAgh+IgsIBBIH
CLiW5+feKiIDCIEBIi4IBRIqCig0YTJlOWJjMzMwZGFlNjBlN2I3NGZjODViOTg4NjhhYjQ3MD
A4MDJlMqgCCAYSowIKoAIevmSZkPl4snZAgw9zZJwhPyy4wICz9W2+ziwCNIycUkwmG5rnQgLk
L5YsZlqtMzyi992KElR992zh5wN8bxJc6z+I6H+NUxVl4OiJjTDy8ToW8IIrYeLVXKiaDsRiqp
4bknu+sttoOaH2HNWpnoqadxGBmZTIqNMK09TsF0ykJNB2AUY5uc/U5ja8PZknNTCaggyvCiUN
JE/ypuYDQIvQ7kDtDOxReT77zWIds+fWGYgndj2kuzZygTMd2PZFLw5n9PWx0nL4F6SgwbzcNg
GmcZKwZJZmdw0CkxzZilCR3SXdMcJoKeoBy9VEI1j2WrxI6bIi2xpuinKX4YPqllKxzEfvSuWa
jzFkvM6+jMUuLJJVpcfb7iiUrkPDidhhtKo5AAAAIFnpQsBBAAAAAKccYkBJAAAAYFcPWUBaWw
pAP3eDe+Iap/8SoX4MCIY6OKqjDONBOktFns4uDQ5AoTvtfNwBbSs01SemwA3P2HUnziVSfzkJ
fZKQEfFbz5ZGOxDDj8vo3ioaEEDYhnhioPpcQS6De7xceNVgxis=

Here’s how it looks if you decode it with protoc --decode_raw . I’ve truncated some of the fields, since they were very long and provided no useful context.

1: 2
3: 6728882909970694177
4 {
  1: 106
  2 {
    1: "\200\200\200\200\324\342\231\353j\200\200\200\200\364\347\231\353j..."
    2: "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
    3: 0xc042e95920000000
    4: 0x40621ca700000000
  }
}
4 {
  1: 126
}
4 {
  1: 4
  2 {
    1: 1468559641400
  }
}
4 {
  1: 129
}
4 {
  1: 5
  2 {
    1: "4a2e9bc330dae60e7b74fc85b98868ab4700802e"
  }
}
4 {
  1: 126
}
4 {
  1: 4
  2 {
    1: 1468559641400
  }
}
4 {
  1: 129
}
4 {
  1: 5
  2 {
    1: "4a2e9bc330dae60e7b74fc85b98868ab4700802e"
  }
}
6 {
  1: 6
  2 {
    1: "\036\276d\231\220\371x\262v@\203\017sd\234!?,..."
  }
}
7: 0xc042e95920000000
8: 0x40621ca700000000
9: 0x40590f5760000000
11 {
  1: "?w\203{\342\032\247\377\022\241~\014\010\206:8..."
  2: 1468561278915
  3: "@\330\206xb\240\372\\A.\203{\274\\x\325"
}
12: 5574

And here’s how it looks with protofudger . Again, I’ve truncated some of the fields. I think the protofudger output is far more interesting!

decoded 17 fields

  1: (varint) 2
  3: (varint) 6728882909970694177
  4: {
    1: (varint) 106
    2: {
      1: (bytes) 80808080d4e299eb6a80808080f4e799eb6a...
      2: (string) "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      3: (doublele) -37.823032
      4: (doublele) 144.895386
    }
  }
  4: {
    1: (varint) 126
  }
  4: {
    1: (varint) 4
    2: {
      1: (varint, microseconds) 2016-07-15 15:14:01 +1000 AEST
    }
  }
  4: {
    1: (varint) 129
  }
  4: {
    1: (varint) 5
    2: {
      1: (string) "4a2e9bc330dae60e7b74fc85b98868ab4700802e"
    }
  }
  4: {
    1: (varint) 126
  }
  4: {
    1: (varint) 4
    2: {
      1: (varint, microseconds) 2016-07-15 15:14:01 +1000 AEST
    }
  }
  4: {
    1: (varint) 129
  }
  4: {
    1: (varint) 5
    2: {
      1: (string) "4a2e9bc330dae60e7b74fc85b98868ab4700802e"
    }
  }
  6: {
    1: (varint) 6
    2: {
      1: (bytes) 1ebe649990f978b27640830f73649c213f2cb8c080b3f56dbece2c02348...
    }
  }
  7: (doublele) -37.823032
  8: (doublele) 144.895386
  9: (doublele) 100.239708
  11: {
    1: (bytes) 3f77837be21aa7ff12a17e0c08863a38aaa30ce3413a...
    2: (varint, microseconds) 2016-07-15 15:41:18 +1000 AEST
    3: (bytes) 40d8867862a0fa5c412e837bbc5c78d5
  }
  12: (varint) 5574

So that’s just one message. Most of the fields are mysteries, but I expect in the next couple of months we’ll start to get some more understanding of what’s going on.

There are already a few third-party projects that interact with this protocol data. My favourite right now is the PokeRev Mapper . It’s set up to passively monitor communications between volunteers’ Pokemon GO instances and the backend, so it shouldn’t result in anyone getting banned. I’ve seen a couple of other projects that actively intercept data, and might result in soft or hard bans as they’re detected.

Given that the communications are so easy to work with right now, I’ll be interested to see what comes out of the Pokemon GO reversing community. I’ll be equally interested to see how Niantic responds our research and activities.

Now go, catch ‘em all!

Back to posts





About List