Reverse engineering tools

Dear Lazyweb: What are reasonable reverse engineering tools for web sites?

It seems like every few months I find myself cracking the login and upload or download process on some site -- sorry, some "web application". Invariably they either don't provide an API, or their API is wholely inadequate. The "new web" doesn't want you to script it, because that might prevent them from forcing lock-in on you. They all want to be titans of the industry like Compuserve or AOL, apparently not having heard about this little thing called "The Internet" that got really popular for a minute back in the 90s.

So to do the things I want to do, I often have to crack their undocumented protocols and halfassed security measures. I don't enjoy it, but for my sanity and out of self defense, I do it a lot. "Nation Suddenly Realizes This Just Going To Be A Thing That Happens From Now On".

The kind of discoveries I end up needing to make usually look like:

  • Their OAuth "application" API is inadequate and intentionally crippled, so let's go straight for the web login page and get a session cookie.
  • Oh look, here's the magic URL you are squirting JSON data down.
  • Oh, but the arguments to that URL are signed.
  • Oh, here's the signing key you embedded in the code but tried to hide.
  • (And you're sniffing user agents. Aw, that's cute.)

I don't have proper tools to easily do the sorts of things I need to do to solve these problems. I mean, I manage, obviously, but it sucks. Here are the kind of questions I find myself asking that are harder to answer than they should be:

  • This form's "Submit" button isn't actually a form element, and the source doesn't have an onclick handler on it. Something somewhere else has installed a handler ...somewhere... so that when I click it, a JS function runs and a URL gets loaded. What function? What URL?

  • Clicking this thing reads and writes a bunch of data to random URLs via XMLHttpRequest, then does a redirect. What URLs did it load and what did it send and recieve? Sometimes I can answer this question using the Resources or Timeline panel in Safari's inspector, but as far as I can tell, the intermediate data vanishes from the timeline as soon as the top-level URL changes, or the DOM gets zeroed out, or something. I don't know. I just know that I can't see a record of URLs being loaded that I know were loaded. Mozilla and Firebug don't seem to be any better than Safari in this respect. "Oh, the document is gone, you must not care about it any more."

  • This page does some Javascript contortions and eventually emits a <video> tag and configures it via JS. Eventually that tag initiates network activity. What URL is it loading? It doesn't show up in the inspector. It is happening out-of-process?

  • A URL is being loaded at the bottom of a giant stack of obfuscated, minimized Javascript. What's the call stack? Basically: if I lived in a world where "javascript debugger" was a thing that actually existed (ha ha ha), how do I set a breakpoint on any network activity?

I could use mitmproxy and Wireshark for some of this, but that's a huge pain in the ass, and more heavy-handed that I usually need. Also Wireshark is awful (it always leaves me thinking "How was this supposed to be any better than tcpdump?") It makes much more sense to intercept this stuff inside the browser. All the information is in there since it's the thing initiating contact with the server.

Previously, previously, previously, previously, previously, previously, previously.

Tags: , , , ,

33 Responses:

  1. Adam Demasi says:

    Chrome devtools lets you preserve the network and console tabs’ contents when the page changes.

  2. Adam Fields says:

    Maybe you should write a browser.

  3. John Adams says:

    Nearly everything you're asking for is inside of the Chrome debugger. Say what you will of Google, but their developers have made a really excellent debugging tool here. I have been doing a considerable amount of JS debugging lately and I don't think I could do it using say, Safari or Firefox. Their developer tools are horrible.

    You can monitor XHR, attack JS directly, modify JS code in place and see what handlers are attached to what DOM Objects. That's more than enough to take apart most APIs.

    "Basically: if I lived in a world where "javascript debugger" was a thing that actually existed (ha ha ha), how do I set a breakpoint on any network activity?"

    Chrome's JS debugger is fantastic. You can set breakpoints on XHR and trace async calls. It'll even maintain breakpoints across page reloads.

    • Jay says:

      I was going to say the same thing. One additional feature to note in Chrome is that you can inspect an element, say a fake form button, and see the event listeners attached to it.

      • Chris says:

        You can, but this gets a bit tricky as you'll invariably end up in the framework code since almost no one binds event listeners directly, instead using something like jQuery. It takes a bit of work to figure out how to get to the actual bindings from there.

        I find it is usually simpler to break on the XHR request and then inspect the stack trace to follow the execution flow back to the actual bound application code.

    • Came to post pretty much exactly the same thing. Chrome is in many ways a terrible, terrible joke as a "web browser", but as a Javascript debugger it's aces.

    • Chris Davies says:

      Eh, you say that but last time I checked Chrome's debugger had two misfeatures that render it more or less useless for reverse engineering other peoples' code.

      For one thing, it lets you pretty print minified javascript, but it won't regenerate the debugging information based on the pretty printing. So any breakpoint you set in minified code is on virtual line zero and will never trigger appropriately.

      Secondly, even when you have the debugger open it won't spare a couple of bytes to track the source origin of a lamda function, so executing lambdas are effectively black boxes. Since all Javascript developers are idiots, and are aided and abetted in to their idiocy by frameworks like JQuery, usually deployed application callstacks are 20 frames of framework-du-jour bullshit at the bottom, a frame of opaque crap in the middle, then another 20 frames of bullshit and the native call that actually something at the top.

      • John Adams says:

        Not my experience at all debugging React, Meteor, or Angular, all "frameworks du jour".

        While I agree it's difficult to debug minimized JS, the minimized debugging problem has been pretty much fixed in recent versions and pretty print allows you to set breakpoints within pretty-printed JS. Haven't had issues with that in awhile.

        Also, contenting that everyone is an "idiot" exercising "idiocy" by using frameworks doesn't strengthen your argument. Not everyone wants to write the world from first principles like it's 1995.

        • jwz says:

          Not everyone wants to write the world from first principles like it's 1995.

          Well, I do, but I accept that I am in the sad situation of needing to crack systems written by people with more regrettable tastes than mine.

  4. @bizzyunderscore says:

    Burp suite?

    • @bizzyunderscore says:

      Nevermind, you're interested in what's going on in the browser end, I'd use the chrome debugging tools.

  5. For HTTP-level things, I've had some luck with Charles - it has a workable timeline or per-host view, and it can MITM TLS connections for you to show you what's going on under the hood.

  6. db48x says:

    The Firefox dev tools are also pretty good. Click the gear icon to get to the prefs, then check the 'Enable persistent logs' checkbox. You can turn on a bunch of other gewgaws in there as well, if you're so inclined. It'll also unminify javascript, which often helpful.

    On the other hand, I don't believe that the Firefox dev tools let you set a breakpoint on XHR activity, which is a nice feature of the Chrome dev tools.

  7. Ole Eichhorn says:

    A golden tool for this type of work is Fiddler. It neatly inserts itself as an HTTP proxy, and allows you to inspect and modify all HTTP(S) traffic between a browser and a host. Anything the browser does, you can do, and anything the server sends back, you can capture.

    In addition to being helpful for reverse engineering, it's also quite handy for debugging.

    • Captain Obvious says:

      Fiddler is an alright tool, but:

      1. He said he's not interested in a web proxy, since (understandably) he feels that's overkill (and it doesn't actually solve most of his issues w.r.t. JavaScript),

      2. Fiddler is Windoze-only. He's using Safari. My Common Sense is tingling, and it's telling me he won't want to run a VM just for this.

  8. Lazyweb, Esq. says:

    Agreeing about Chrome dev tools being awesome for stuff like this.

    I'd also add that CasperJS or simply PhantomJS are also useful for scripting these kinds of interactions if you're too lazy to reverse-engineer low level details and just want to do high-level interactions in a headless browser.

    Is it less efficient to actually load and render the page in a headless browser and simulate an onclick event? Yes.

    Do you need the cronjob you're using to download ultraporn to be so computationally efficient it'll be an example in the next edition of AoCP? Probably not.

    • rob says:

      I'll continue the irrelevant thread by mentioning that in addition to PhantomJS, the Selenium project is pretty good at automating a number of real browsers and giving you access to the DOM so you can do things like "when you see a form field with name like 'login' or 'username' put my username in there" or "when you see a submit button, click it"

      As an added bonus it has Perl support!
      https://metacpan.org/pod/Selenium::Remote::Driver

  9. If you're using wireshark, then you should instead be using some sort of mitm proxy. People have suggested burp (free version sucks), fiddler (Windows) and Charles (not terrible), I'd actually suggest OWASP ZAP.

    For client side stuff, Chrome's tools are far better than Safari's.

  10. Richard Cheese says:

    As with option #1 in https://www.jwz.org/doc/backups.html , the only winning move is not to play.

  11. Jameson says:

    +1 for PhantomJS - the headless (javascript-)scripted browser. No need to reverse-engineer huge minimized obfuscated js dumps, it's all done automatically, you just parse and return JSON of whatever results you want.

  12. Ingmar says:

    httpfox solves all your "what network request went where and what did it contain" needs. It won't forget them until told so, and it captures everything, including plugin activity.

  13. Kaleberg says:

    If you are really lazy you can buy a copy of Fake from the app store. It is a browser with an Automator like scripting language. I use it to fetch website logs and financial information. The main advantage is that it provides a superficial interface so you don't have to grovel deep inside ten layers of JS package and DOM structure to figure out what is happening.

  14. Aquarion says:

    Chrome debugging tools will also render every request it makes as a curl command, if you right click on it in the network debug. But for the kind of puppeting of the zombie corpse of web development you're talking about, PhantomJS is a headless web browser with a sane API that - so far - gets around most of the "You're not my real browser" tricks.

  15. berdario says:

    I second Burp Proxy.

    while it's out-of-browser (and not open source), Chrome and Firefox dev tools pale in comparison to it, for this use case. And it's arguably less of a PITA than mitmproxy.

  16. Kyzer says:

    I don't know why more people aren't saying Firebug. It's a lot better than the johnny-come-lately built-in Firefox debugger.

    Something somewhere else has installed a handler

    Open Firebug, HTML tab, click the arrow thingy. Select the pesky submit button (either by moving the mouse over the document and clicking, or by using the HTML view). Now go to the events tab on the right to get all applicable listeners.

    "Oh, the document is gone, you must not care about it any more.

    Open Firebug, network tab, toggle the "Persist" button. Firebug now remains open and keeps history between page redirects. You can also do the same on the console tab.

    What URL is it loading?

    Open Firebug, HTML tab, select the video element, right click and "view in DOM inspector". It's the currentSrc value.

    how do I set a breakpoint on any network activity?

    Open Firebug, network tab, the little yellow pause icon with "XHR" written on it ("Break on XHR"), which will bring you into the script debugger at the next request. Yes, it's not the same thing as breaking on all network activity, but unless the page is using document.write("<img>") smoke signals for communication, usually it's all you need.

    Once taken to the script debugger, go to the "Stack" tab which has the full stack trace.

    • Steve Nordquist says:

      Thanks for mentioning Firebug. And that it's not the built-in anymore. And the walksies.

  17. David Gurba says:

    It sounds like you want something somewhat scriptable ... how about https://github.com/sidorares/crconsole

    crconsole is a remote Javascript console for Chrome/Webkit that runs in your terminal.

  • Previously