It’s the process of breaking down the text into sentences and phrases. The work entails breaking down a text into smaller chunks (known as tokens) while discarding some characters, such as punctuation.
Consider the following example:
Text input: Potter walked to school yesterday.
Potter went to school yesterday, according to the text output.