This post was supposed to come before my previous one, but that obviously didn’t happen. Now, I feel it is more important than ever to explain who/what TexBot is.
Quite simply, TexBot is a chatbot. He was originally written in Java, and used JMegaHal, a version of MegaHAL ported to Java. TexBot originally existed on the #squadbng IRC channel, and did much more than just talk. The original program could manage the channel, run a ShoutCast server/DJing system, store and recall quotes, roll dice, and a few other random functions. I improved him several times over the years, and he went through several “brains” (due to various storage failures) composed of chat logs and other documents.
When first creating and using TexBot, I honestly didn’t have a good grasp of how he worked. I knew he used “Markov chains” and there were statistics and probabilities involved, but even after researching it I didn’t fully understand what was going on. I just knew he took in sentences and produced rather humorous results and people enjoyed conversing with him.
Fast forward about 4 years. I am now majoring in linguistics, with a particular interest in computational linguistics and natural language processing. And guess what? I actually understand what a “Markov chain”/Markov model is! I know how TexBot works, and I can modify his innards in various ways to make it more interesting. So what better way to toy around with my newfound knowledge than to revive TexBot? So that is what I plan to do.
The first order of business was finding my old Java code, which was a rather difficult task. I had to dig through various old folders and drives but finally came up with the old “TexBot” folder containing the main program, the various modules I wrote, and all the various brains and training data.
Next, I had to get him up and running. I had to strip out the IRC code, the custom modules, and just leave the JMegaHal portion and the brain loader. I cleaned up the code a bit and simplified things. Then I had to worry about actually compiling and executing it. See, TexBot used to be a memory hog, take forever to start up, and all sorts of other things. I then realized that all those problems were 4 years ago. Guess what? Today TexBot runs just fine on my little netbook with the tiny Atom processor.
And that is how I got to that conversation posted late last night. TexBot lives! He is still using a brain from several years ago, but with the additional processing power and memory I have at my disposal, I plan to combine all my previous training data into a new brain.
From there, I plan to continue to refine the data and feed him new things and try to improve the responses. They won’t ever be perfect, as he currently uses relatively primitive modeling to select replies. However, I may be able to improve the model a bit using things I have learned in some of my classes.
I’m also looking at porting him over to a different language and getting him up on the web. Ideally, I’d like to have a chat interface on my website where people can talk to TexBot and provide more conversation data. I haven’t worked out the logistics of it, but it SHOULD be possible.
Why do this? For the hell of it, really. Because I think it could be fun. I loved tinkering with TexBot when I was first learning how to program. Now I have a lot more programming experience and have some NLP tricks and other stuff, I think it will be fun to revisit. Plus, TexBot always provides entertainment with his replies.
I’ll have to dig up an old quote file or two for some examples…
Leave a Reply