Tuesday, November 28, 2006

Automated Calculation of Word Correlations from Text

Automated Calculation of Word Correlations from Text
(direct link to Word Document download, 1.4 MB)

I present algorithms to mine data from large texts to estimate the degree of connection between two words by estimating their correlation, a measure of how often the words appear near each other, and how close they appear. The hit counts on Google searches allow a simple measure of correlation using the vast corpus of the Internet. I write a program to more accurately compute correlation of words within a text. Correlation data is used to locate conceptual clusters of words. I speculate on how one could extract semantic relationships from correlation patterns

No comments: