What is part-of-speech tagging and how can I use it?

Part-of-speech tagging is the process of breaking a language down into key categories on a word-by-word basis. In order to effectively analyze language, computers have to first understand these constituent parts of speech. 

Native speakers possess an intuitive knowledge of parts of speech (often without an awareness of the underlying technicalities), but a computer needs training to carry out part-of-speech tagging.

Parts of speech commonly include the following:

  • Pronouns: words that stand in place of noun phrases. Common examples include I, you, she, they, and it.
  • Verbs: actions, for example, going, jumping, running, or being.
  • Nouns: names for places, people, ideas, or things. Daniel, chairs, and New York are nouns.
  • Prepositions: a position in space or time or used to introduce an object, for example, like under, near, during, of, with.
  • Conjunctions: used to connect ideas. Common conjunctions include because, and, however and moreover.
  • Adverbs: describe verbs, such as heavily, gingerly, swiftly, and gracefully.
  • Adjectives describe nouns, such as happy, beautiful, blue, or big.

The tricky part here is that all words don’t necessarily fall neatly into a single category. For example, you can “play” (verb) a part in a “play” (noun). There are many such words that can only be classified in their respective categories if we are given additional context. If an algorithm is to sort words into part-of-speech categories, things are a little more complex than they seem at first, because the algorithm would need to perform some sort of contextual analysis as well.

The part-of-speech-tagging challenge

It’s very easy for a customer service representative to look at a sentence and understand what it means, but the same cannot be said of a computer.

“I require a great deal of help.” This is a plea for assistance to any service representative who reads it. In a support scenario, this would mean that a customer needs urgent help. If the sentence is a part of some sort of opinion or feedback, it would convey a negative sentiment. A human can figure this out with ease. However, a computer algorithm may very well simply classify the word “great” as positive, thus leading to an incorrect response. If this sentence is sent to a customer support bot, it may be construed as positive feedback (looking at the words “great” and “help”), and a sentiment analysis bot may tag this sentence as positive feedback. This is why it’s extremely important for each word to be tagged correctly and looked at contextually.

How does part-of-speech tagging work?

As you can imagine, there is no hard and fast rule determining how speech is to be tagged. This entirely depends on your use case and the type of algorithm you’re trying to design. Here are a few of the tagging techniques you can use:

  • Rule-based: feeding in a preset list of rules for the algorithm to follow
  • Statistical model: a statistical approach of learning to tag based on a labeled dataset. This approach includes hidden Markov model, conditional random field, (deep) neural network models, and/or a combination of these.

Rule-based tagging is interesting since it doesn’t leave that much room for error. Based on a list of tagging rules, the algorithm will tag each word within its classification. The problem here is in creating the rules themselves. There can be hundreds if not thousands of rules when it comes to classifying sentences in the English language, and the creator of the program will invariably miss a few. Additionally, the continual evolution of languages means that rules are likely to become outdated. The program isn’t really able to account for outliers since it is strictly rule-based, so if an unfamiliar word sequence is thrown into the mix, it can be tagged incorrectly.

Statistical models learn the tagging “rules” automatically. This makes them easier to maintain and faster to train. They are also more adaptable and transferable between domains. But the quality of the models relies heavily on the quality of the labels in the training data. It may also require significantly larger datasets, as statistical models need to be able to generalize. 

The ability to adapt to new domains is made essential by the fact that language is heavily context dependent. Compare language used on Twitter or in chat rooms against that in legal documents, for example. This is a major problem in sentiment analysis of online chat rooms, since text is non-standard (with elements of sarcasm, hyperbole, informal speech and syntax errors), which can easily mislead even the best NLP algorithms.

Where can I use part-of-speech tagging?

Part-of-speech tagging is essential whenever you want to automatically analyze human speech data. This application is most prevalent in IoT devices such as Amazon Alexa. Part-of-speech tagging forms the backbone of NLP engines (like automated support), translation apps, sentiment analysis, and a lot more. We will look at this in more detail in an upcoming blog post.

How can I use part-of-speech tagging?

The initial work required to create a part-of-speech tagging engine is immensely difficult, but ultimately, it can save thousands of hours of manual labour when done correctly. This sort of tagging can take grammar and spell checkers to the next level: instead of checking for the usual dictionary-based spelling errors, part-of-speech tagged texts can also be checked for syntactic and contextual errors. It can analyze large amounts of discussion or data and can provide amazing insights into user/customer/audience/worldwide sentiment. This has major implications for things like stock market analysis, election campaign forecasts, business decision making, and much more. As tagging improves the accuracy of computer-based customer support responses, it could help companies scale their client base upwards without necessarily having to increase support costs proportionally.

When it comes to the immensity of the training process, Canotic can make this easier. We turn raw text data into cleanly labeled training data ready to fine tune your NLP algorithm. If you’d like to learn more about how we can help, reach out to us.