Automated Calculation of Word Correlations from Text
(direct link to Word Document download, 1.4 MB)
I present algorithms to mine data from large texts to estimate the degree of connection between two words by estimating their correlation, a measure of how often the words appear near each other, and how close they appear. The hit counts on Google searches allow a simple measure of correlation using the vast corpus of the Internet. I write a program to more accurately compute correlation of words within a text. Correlation data is used to locate conceptual clusters of words. I speculate on how one could extract semantic relationships from correlation patterns
Tuesday, November 28, 2006
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment