ARTIFICIAL INTELLIGENCE (25) – Natural Language Processing (5) Fundamentals

Natural Languages are languages used by humans.

The aim of Natural Language Processing is to talk to the computer in Natural Language and have the answer of them also in Natural Language.

NLP is a subset of AI and uses ML/DL techniques. Source: Sathiyakugan 2018.

NLP is a subset of AI and uses ML/DL techniques. Source: Sathiyakugan 2018

In Natural Language Processing (NLP) we often deal with Unstructured data, that is not organized, indexed and referenced. To understand this data NLP should to learn its structure and grammar. The majority of the data of a company is unstructured.

Natural Languages are complex to the computers because they are not precise and unambiguous. Ambiguities in Natural Language can be Lexical, Syntactic or Referential.

Natural Language Processing (NLP) solves a wide range of problems by enabling machines to understand and work with human language. Key applications include sentiment analysis, which identifies whether text expresses positive, neutral, or negative opinions; machine translation, which allows content to be understood across different languages; and question answering, where systems provide answers based on large knowledge sources. NLP is also used for text summarization to condense long content, text classification to categorize information or detect spam, text-to-speech to convert written text into spoken language, and speech recognition, which transforms spoken language into text.

Natural Language Processing (NLP) is generally divided into two main components: Natural Language Understanding (NLU) and Natural Language Generation (NLG).

NLU focuses on analyzing language by converting text or speech into meaningful representations. It aims to resolve ambiguities, capture context, and understand the deeper meaning of language beyond simple grammar or sentence structure.

NLG, on the other hand, is about generating language. It takes internal representations and produces coherent text by selecting appropriate words and organizing them into well-structured sentences.

In simple terms, NLU deals with analysis, while NLG handles synthesis.

A typical NLP pipeline includes three main stages: text processing, feature extraction, and decision making. These stages can use classical NLP methods, machine learning, or neural networks, which require training on large datasets before making predictions.

Text processing transforms raw text into a structured format by identifying elements like words, phrases, and parts of speech. It also includes techniques such as stemming and lemmatization to reduce words to their root forms, removing stop words and punctuation, and identifying entities through Named Entity Recognition (NER). Additionally, coreference resolution links pronouns to the correct entities.

Overall, text processing analyzes language at three levels: syntax (structure), semantics (meaning), and pragmatics (contextual meaning).

Milestones

  • 1948: Early NLP begins with dictionary-based machine translation systems, initially for German-to-English and later Russian-to-English.
  • 1957: Noam Chomsky revolutionizes linguistics with Syntactic Structures, influencing NLP and formal language representations.
  • 1966: The ALPAC report highlights poor machine translation results, leading to reduced funding but continued research.
  • 1970: NLP shifts toward AI-driven semantic understanding and knowledge representation, with systems like SHRDLU and LUNAR.
  • 1980: Machine Learning drives the rise of statistical NLP using corpora and models like Hidden Markov Models.
  • 1982: Jabberwacky marks the early development of chatbots aiming to simulate human conversation.
  • 1998: The FrameNet project advances semantic role labeling and shallow semantic parsing.
  • 2001: Neural networks and word embeddings are introduced for language modeling, paving the way for RNNs and LSTMs.
  • 2003: Latent Dirichlet Allocation becomes a standard technique for topic modeling.
  • 2013: Word2Vec popularizes embeddings, boosting the use of neural networks like RNNs, LSTMs, and CNNs in NLP.
  • March 2016: Microsoft Tay demonstrates risks of AI chatbots after being shut down for inappropriate behavior.
  • September 2016: Google introduces Neural Machine Translation, significantly improving translation accuracy.

 

References:

Devopedia. Natural Language Processing

 

 

 

Licencia Creative Commons@Yolanda Muriel Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)

Deja un comentario