Of Carbon and Silicon

Tuesday, 19 October 2010

Today... er, yesterday. Tomorrow?

It's not really a proper blog, this, is it? Most people use their blogs for stuff like game reviews or news articles. The crap on this blog is what Facebook, MySpace, and Twitter are for. Mind you, none of those websites even existed when Of Carbon and Silicon went live for the first time back in 2005!

Anyway, onwards to lesser and greater things. Wait... never mind.

So, it's a jolly good thing that I've finally chosen which new mobile phone to get... because my current mobile phone died today. Not like "dead battery", like "time for a hardware upgrade". Like "no longer capable of functioning within established parameters". Cue the Marche Funebre by Frederic Chopin.
Well, between the time when I actually get my new mobile and now, I had to have a stopgap. An emergency mobile. A Tracfone from Walgreens fits the bill. I had to initialise it today in order for it to work... which meant forty-five minutes on the line (the land line) with a series of computer prompts and access codes.
That's really why I bring this up. I mean, you don't care about my mobile phone -- you come here to read about my own experiences with and opinions of life in general.

I'm quite used to dealing with computerised answering services, through experience with my college loan debts. Banks and lending firms seem to enjoy having their customers pressing buttons and saying "Yes", "No", and "English" in response to prompts. That way, they get more money at the end of the pay-period not having to pay actual people to do that kind of tedious work.
So, I ring up the number with which I was provided on the little leaflet thing with the initialisation code on. At first, it sounds like just another run-of-the-mill pre-recorded set of computer prompts. But, as I listen to the announcer, I start getting more and more creeped out by the minute. At the end of the deal, that announcer's voice is now firmly entrenched in the uncanny valley.
"Why?" you might ask...
I thought I noticed an odd thing about the way the announcer would say words. I finally figured out, he was saying the word "Tracfone" exactly the same way each time. Whilst pitch would fluctuate depending on the usage, inflections would not.
It finally hit me when I was receiving my activation code. The announcer was providing the code in nonsegmented strings of numbers. "Nonsegmented", in this case, meaning that it sounded as though the announcer was reading it from a leaf of paper. Pre-recorded prompts use short voice clips -- announcers for these will be asked to recite numbers as though they are all in the middle or end of a sentence. In effect, when you get on the line to a recording and it lists numbers, it will be segmented, as though each number is its own sentence. The Tracfone announcer was not doing this. I was provided with strings of three numbers at a time, read as though the entire string was a sentence. I came to one inescapable conclusion...
The Tracfone answering system announcer is a speech synthesiser.
The only other alternative was that the numbers in all Tracfone activation codes come in the same strings of three numbers, only arranged differently. In this way, a human announcer would be given a list of each 3-digit number and asked to read them all like a list. That seems rather unlikely. If that were the case, then Tracfone activation codes would be supremely easy to crack (provided one had the right equipment).

I remember from research I did into robotics for a high-school project that AT&T have been working on a natural-sounding speech synthesis programme. The most readily-accessible one to the public is on NOAA weather radios -- most markets use a voice called "Craig". This particular synth has been undergoing constant debugging, using the weather band as a testing platform (the public don't care about vocal inflections or whether this phoneme or that phoneme is processed correctly -- they just want to know if they're going to be struck by lightning). To that end, "Craig" is now able to change the pitch and tone of voice depending on the urgency of the situation (it might say, "Severe thunderstorm warning until 3:47 AM," as though it were reading a passage from an instruction booklet, whilst it might say "Tornado Emergency for the Omaha/Council Bluffs area!", as though it had just witnessed the Second Coming). Also, there is currently an endeavour to have it simulate breaths between sentences -- a feature currently available on the latest version of MacInTalk.
So, I guess AT&T have made more progress than I thought in the relatively short stretch of time since my sophomore year of high school. Whatever this voice was that I heard on the line with Tracfone certainly had me fooled. And I like to think that I'm quite good at distinguishing acoustic and artificial sounds (I work with sound, after all).

As I mentioned before, it's really sort of creepy when you think about it. Not creepy like "cemetery at midnight" or "cold, dark room"... it's more like "nightmare that you remember the next day" creepy.
Now that Man has succeeded in creating an indiscernable artificial voice using phoneme sampling from acoustic speech, the next step is what we in the music trade call "acoustic modelling". In this case, creating an artificial vocal tract in a computer, using no speech samples at all. This is what I find totally disturbing -- up to this point, computers have always been reliant on humans for everything. Taking the human element out of speech synthesis is just one of many stops on the road to total computer supremacy.
In fact, with this (so far, theoretical) acoustic model, Man can begin taking liberties with the voice. At this stage of evolution, Man is the only species capable of vocal speech. A simple CT scan of any vertebrate's airway will provide the synth's programmers with enough information to begin adding human vocal parameters, thus providing voices to any non-human (and, verily, non-primate) animal.

Here's a bit of trivia from The Mind's Rubbish Bin that I find to be apropos...
From the middle-ages all the way through to the 1700s, creating an apparatus capable of speech was considered witchcraft. Of course, most fields of modern scientific endeavour were considered witchcraft... however, I'm inclined to agree with them on this.
Soon after vocal modelling is perfected (perhaps even before), conversation simulators will be all the rage. I first became aware of these during downtimes in the 2003 school play. There was (and maybe still is) a programme on the internet called, I think, "Oliver". It was a text field that one could type a statement or question into and the programme would return an appropriate response. Of course, its "attention span" was not all that great -- it didn't carry on with a single subject at length, frequently asking random questions of its own ("What's your favourite colour?", "Where are you right now?", "Do you play an instrument?" -- conversations could easily turn into interviews with "Oliver").
Of course... puh-shaw, man: 2003! A lot can happen in seven short years when it comes to technology. After the conversation simulator is perfected, used in tandem with vocal modelling synthesis and artificial intelligence, the computer becomes its own life-form. Depending on how it is programmed, it can become either Data or Lore. Interacting with humans to try and become human, itself... or become bent on universal conquest.

At this point, I would like to remind our eminent computer scientists of Asimov's First Law: a robot may not injure a human being or, through inaction, allow a human being to be harmed.
"Wait, robots? I thought we were talking about speech synthesisers!"
We were. However, all of these technological advancements will all eventually culminate in the creation of a highly intelligent independent electromechanical agent. Robots. Androids. Consider, for a moment, the following: the Star Trek: The Next Generation character of Data is not, in fact, a human actor, but a fully-functional android. This is, of course, incorrect, but just think about it for a couple of ticks. Consider that without vocal tract modelling in computer software that Data would be incapable of speech... and, if you don't have a robot that can speak, why should you have a robot at all?

Hm. I guess I go off on tangents enough to justify calling Of Carbon and Silicon a "blog" after all.

Posted by theniftyperson at 2:08 AM CDT

Updated: Thursday, 21 October 2010 7:53 PM CDT

Share This Post

Post Comment | Permalink

Comments Page

View Latest Entries

«	October 2010					»

S	M	T	W	T	F	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31