Word-sense disambiguation is not as easy as it sounds when you have two or more meanings for one word.  MIT is now using what is called “Topic Modeling” to determine which meaning is correct.  In the English language most double meanings are usually not even close however with clinical terminology that may not always be the case. 

The algorithm developed now will use the topic of conversation to try to determine which in free notes is applicable.  Freeform notes are always the last items on the list to be useful as free text is the the frontier when extracting information.  The accuracy rate is around 63 percent which also tells you there’s still more work to be completed.

In the MIT article they use the word “discharge” which is pretty common as an example which could refer to a body secretion or sending somebody home from the hospital.  This process has been used in other areas but not so far with clinical and biotech information.  The information is scheduled to be presented at American Medical Informatics Association annual conference this week.


At the American Medical Informatics Association’s (AMIA) annual symposium next week, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory will present a new system for disambiguating the senses of words used in doctors’ clinical notes. On average, the system is 75 percent accurate in disambiguating words with two senses, a marked improvement over previous methods. But more important, says Anna Rumshisky, an MIT postdoc who helped lead the new research, it represents a fundamentally new approach to word disambiguation that could lead to much more accurate systems while drastically reducing the amount of human effort required to develop them.

Where an ordinary topic-modeling algorithm will search through huge bodies of text to identify clusters of words that tend to occur in close proximity to each other, Rumshisky and her colleagues’ algorithm identifies correlations not only between words but between words and other textual “features” — such as the words’ syntactic roles. If the word “discharge” is preceded by an adjective, for instance, it’s much more likely to refer to a bodily secretion than to an administrative event.
Ordinarily, topic-modeling algorithms assign different weights to different topics: A single news article, for instance, might be 50 percent about politics, 30 percent about the economy, and 20 percent about foreign affairs. Similarly, the MIT researchers’ new algorithm assigns different weights to the different possible meanings of ambiguous words.