What you're hearing is how a network designed to send the noises made by your muscles as they pushed around air came to transmit anything, or the almost-anything that can be coded in 0s and 1s.
The frequencies of the modem's sounds represent parameters for further communication. In the early going, for example, the modem that's been dialed up will play a note that says, "I can go this fast." As a wonderful old 1997 website explained, "Depending on the speed the modem is trying to talk at, this tone will have a different pitch."
That is to say, the sounds weren't a sign that data was being transferred: they were the data being transferred. This noise was the analog world being bridged by the digital. If you are old enough to remember it, you still knew a world that was analog-first.
I used to spend afternoons trying to whistle lines of specific ASCII characters into the 110 baud analog coupler on the teletype. My inability to do this very well made me realize I was probably not cut out to be a singer. Oh, and also, the fact that I was trying to do this at all. For some reason, ampersands were easiest.