stupid gov't web sites

Dear Lazyweb,

I am curious to know how many businesses in San Francisco have both an ABC type 47 license (state-issued), and a Place of Entertainment license (city-issued). So I want to get both lists, so that I can compute the intersection. I'm pretty sure this is all public information, and I think I've even seen it before, but I can't find it now. ABC has a search form, but it won't give me the whole list. And I don't see anything on at all... Can you find it?

  1. xrayspx says:

    Searching by licensee name and the Company radio button.

    If you search for % it complains about over 100 results. If you search for DNA it gives 3, searching on %DNA% gives the expected results of anything with DNA in it, and gives 7 results.

    If it can be made to let you do a real query, it might be possible to make it do sets of 99.

    In other words, annoyingly like work. Why does everyone make users jump through hoops.

    • xrayspx says:

      Yeah, this works:'%'&Entity=C

      Will return all entries for Companies, example.

      This one: "'B%'&Entity=C" returns all results beginning with "B", with results here.

      The problem is that the ASP times out before it gives you all results. What I'm thinking is to make a script that cycles through all sets of 'A%', 'B%', through '%ZX' '%ZY', '%ZZ', the single characters should get anything starting with that letter with a space behind, and the double start characters should get most things with each starting character, though "BR" might still be too many, or "TH".

      I've been out of work for one day. If this is the kind of stir crazy I've already become, I hope unemployment doesn't last long.

      • rapier1 says:

        It times out because you are looking at the whole state. If you drill down to the city it maye work better'C%25%25'&City='SAN%20FRANCISCO'

        Replace 'C%25%25' with [A-Z0-9]

        • rapier1 says:

          Oh, obvioulsy somethign like '1' is goignt o be your biggest return. So you may need to break that down into 10, 11, 12, etc... or even 110, 111, 112, etc...
          but obviously this only makes it marginally more difficult than the trivial solution required for the above.

          • xrayspx says:

            Yeah, I wrote a quick script to do three iterations, so AAA through ZZZ, when it ran, I started getting a lot of 500's, I stopped :-) I should have made it wait until the first was done before wget'ing the next page. It's still up, so, good, but the idea works.

            Input validation can be a bitch, I guess that's why no one does much of it.

            • jwz says:

              Well, none of that has worked at all.

              • xrayspx says:

                Yeah, the quickest, easiest thing I got was this list of addresses in SF with active licenses, but the ideal thing would be if the search by Doing Business As would let you tack a city in there, otherwise, it'd be back to scraping the entire state to get all the business names, then pulling out the SF ones, which just seems like it could take a while.

                Maybe if you get the list of Places of Entertainment, it will be searchable by address so the two lists can be collated.

                All in the name of all ages/lower ages shows is that right? It's a noble cause, if you can get by all the roadblocks. How about Chuck E Cheese, do they count? I heard they sell beer, maybe they know the trick.

                • jwz says:

                  I've gotten even less far on finding the list of POE permits than 47 permits.

                • jwz says:

                  So is that list you got every ABC license (of any kind) in SF? Or something else?

                  • xrayspx says:

                    That list is Active ABC licenses in SF, I have all the data for Revoked, Pending, Withdrawn, etc, but I didn't think that was what we were talking about so I grepped out just the active ones. I used rapier1's idea of going by address, so all the 1's, all the 2's, etc, and none of those pages timed out. They must have an index there.

                    I'm adding the DBA business name to each address line, but that involves scraping all the URLs in the page I made above, and dumping the Doing Business As line out, and adding that to the tables. It's going to take a while at about a page a second for 3200 pages, then I'll add that line in later this evening. Should make it easier to match up to the other list if it ever turns up. At the least you can skim it and say "hey, I KNOW they must have a PoE license" by name if nothing else if you see someplace that you know is doing exactly what you want to do.

                    3200 Liquor licenses seems a bit light to me for a city the size of SF, but I might be wrong, never been there.

                  • xrayspx says:

                    There's a better list, same link.

                    Now I've jammed the business name in there, as well as the license type. The link in the table still goes to their state license page.

                    If they hold more than one license, I'm only returning the first one, sed was complaining about unescaped newlines and 3:00am wasn't the time to fix it. For instance, The Warfield has 47 (which you want), 58 (catering) and 30 (temp), I'm only getting the 47. Someplace like Slims only has a 47, to use two examples from your other post.

                    If it would be useful, I can make a tarball with the entire fileset I'm working from, which is basically a list of addresses, url's and 3500 summary pages from the ABC site.

                    If you need any manual labor in collating this to the other list (when someone finds it), or scraping stuff, feel free. You're doing a very good thing, regardless of how crazy it makes you. I go to a relatively large number of shows, and if only half the club owners were half as passionate as you about sticking to your guns (webcasts, 18+)...

                    Off topic, is there a better way to find out camera policies for a show than calling each club each time? Does it vary more by artist? By club? I end up just never bothering.

                  • jwz says:

                    Awesome, thanks very much!

                    Yeah, can you mail me that tarball, with the sub-pages' content?

                    Camera policy is: DNA doesn't care, but sometimes the performers/promoters do. And pretty much the only way we can get an answer from the performers is by asking them to their faces, which means we don't know until a few hours before doors.

                    It's rare that cameras are prohibited, but it does happen on occasion.

  2. sparklydevil says:

    if you can't find the information online, you can file public records request: