Python String Data Type Tutorial

In this tutorial we learn about strings as immutable collections of Unicode characters.

We cover their quoting, escape characters, concatenation, and why f-strings are the best formatting to use.

What is a string?

The string data type is one of the collection data types in Python. A string is an immutable collection of Unicode characters.

How to declare/initialize a string

Python allows us to easily declare strings by just wrapping letters, words or sentences in either single or double quotes.

Syntax: string in single or double quotes
variable_name = 'string value'
# or
variable_name = "string value"
Example: string in single quotes
message = 'Hello World'

print(message)

The string collection

As mentioned before, a string is a collection of single characters.

In some programming languages like C , we can’t use a string and have to define arrays of characters explicitly.

Example: string in the c programming language
#include <stdio.h>
int main()
{
    char message[] = "Hello World";

    printf(message);

    return 0;
}

In Python we can define a string directly. However, a string is still a collection of characters under the hood.

Think of a string as a row in a table with each character in its own separate cell.

If we consider the word “Hello”, this is what it would look like:

Hello

Each character is also mapped to an index, a number that represents its position in the string.

Considering the word “Hello” again, this is what it would look like:

01234
Hello

How to access characters in a string with the indexer

We can use the index number of a character to access its value. We specify the index number of the character we want to access between [ ] (open and close square brackets).

Syntax:
 string_variable_name[index]
Example: string access via index
message = 'Hello'

print("Character 1: ", message[0])
print("Character 2: ", message[1])
print("Character 3: ", message[2])
print("Character 4: ", message[3])
print("Character 5: ", message[4])

In the example above we access each character of the string individually by using its index in the collection.

note The index number of an indexed collection will always start at 0.

String Quotes

As mentioned, we may use both single or double quotes, but we may not use them together in the same initialization.

Example: mismatched quotes
 message = 'Hello" #SyntaxError

A string may also not be initialized without quotes.

Example: no quotes
 message = Hello #NameError

Single vs double quotes. When to use which

Both single and double quotes are often used inside strings. If we don’t want to escape quote characters inside the string, we can simply use the opposite quotes as the string wrapper.

Example: string using double quotes
# The following string has a single quote
# inside and so is enclosed in double quotes
print("This string is valid because it's enclosed in quotes.")

In the example above, the string is enclosed with double quotes so we don’t need to explicitly escape the single quote.

Example: string using single quotes
# The following string has double quotes inside
# and so is enclosed within single quotes
print('"One, two, five!" - King Arthur. Monty Python and the Holy Grail')

In the example above, we use double quotes inside the string so we simply wrap the whole string in single quotes.

How to change a string value

Strings are immutable and values cannot be changed at runtime. However, we can assign a new value to the same variable that holds a string.

If we try to change a character inside the string, the interpreter will raise an error.

Example:
message = 'Hello'

message[0] = 'Y'

In the example above we try to change the H character to a Y, but because a string is immutable the interpreter raises a TypeError.

Output:
 TypeError: 'str' object does not support item assignment

If we want to change a string at runtime, we have to overwrite it with a new string completely.

Example:
message = "Hello"
print(message)

message = "Greetings"
print(message)

In the example above the old message is discarded and a new message is created with the same variable name but different string value.

How to break a string in source code

Sometimes in our source code we may need to break up a string onto multiple lines. Python doesn’t allow this in the same manner as other conventional languages (like C# ) do.

Example: typical source code line break
message = "Hello
        world"

If we use the example above in a Python script, the interpreter will produce a SyntaxError.

Output:
 SyntaxError: EOL while scanning string literal

The interpreter encounters an End Of Line and assumes that the string should be closed there, but it isn’t.

To break a string onto multiple lines in the source code, we use a \ (backslash) where we want the string to break to a new line.

Syntax:
"line 1" \
"line 2"

Both lines in the syntax example above are enclosed with their own quotes.

Example: correct source code string line break
message = "Hello " \
        "World"

print(message)

In the example above the string is broken up into multiple lines in our source code, however, when we print the string it’s still on the same line.

If we wanted to create new lines in print, we would have to use an escape character or triple quotes.

String triple quotes

Python’s triple quotes allow strings to span multiple lines. We can also include tabs and special characters without escaping them.

To initialize a triple quote string we wrap our string in 3 single or double quotes.

Syntax:
"""
This string can span
over multiple lines
"""
Example: triple quoted string
message = """
Triple quotes not only allow us to
break a string into new lines, both
in the source code and in print, but
it also allows verbatim tabs and
special characters without escaping
them.

Example:
    Tabbed content
    @ # $ % ^ & *
"""

print(message)

In the example above, the string is printed exactly as it’s formatted in the source code.

String escape characters

If we’re not using triple quotes, we can escape certain characters with backslash notation.

Example: escaped character
tab = 'A horizontal \t tab'

print(tab)

The following table lists some of the commonly used escape characters:

SequenceDescriptionExampleOutput
\Backslashprint(’\’)\
\’Single quote ( ‘ )print(’\“)
\”Double quote ( “ )print(”\“”)
\nLine feed (new line)print(‘Hello \n World’)Hello
World
\tHorizontal tabprint(“Hello \t World”)Hello World

String Concatenation

To combine, or concatenate, multiple strings together, we use the + (plus) operator.

Syntax:
 "string" + "string"
Example:
a = "Hello "
b = "World"

print(a + b)

In the example above, we leave an extra space at the end of Hello as a separator between the words.

String Formatting

When we want to combine data into a string, Python won’t convert it automatically, we need some sort of string formatting. Fortunately we have several options:

  • %-formatting
  • .format() function
  • f-Strings

As an example let’s look at the following code:

Example: incorrect string formatting
name = "Monty"
age = 30

print('Name: ' + name)
print('Age: ' + age)

When we use the + operator on a string, the interpreter assumes we want to concatenate. And when we use it on an int , the interpreter assumes we want to do arithmetic.

In the example above, the interpreter will get confused and raise a TypeError.

Output:
 TypeError: can only concatenate str (not "int") to str

This is where string formatting comes to the rescue.

%-formatting

The original method to format a string was with the % (percent) operator. It’s placed within a string at the location we want our data to appear and the interpreter will then replace it with the specified data.

Syntax:
 print("%x" % value)

The % operator is followed immediately by a character that denotes the type of data it is a placeholder for.

Example: replace %i with an int
 print("%i" % 30)

In the example above we use an int as a value, so we use %i as the placeholder.

The following table shows the characters to be used in string formatting:

CharacterDescription
%cCharacter
%sString conversion via str() prior to formatting
%iSigned decimal integer
%dSigned decimal integer
%uUnsigned decimal integer
%oOctal integer
%xHexadecimal integer using lowercase letters
%XHexadecimal integer using uppercase letters
%eExponential notation with lowercase e
%EExponential notation with uppercase e
%fFloating point real number
%gThe shorthand of %f and %e
%GThe shorthand of %f and %E

Don't use %-formatting

%-formatting isn’t great because it’s verbose and can lead to errors, like not displaying dictionaries correctly.

Even the official Python documentation recommends not using %-formatting.

.format() function

Python 2.6 introduced a better way to format strings with the .format() function. The placeholder fields are marked with open and close curly braces and the fields we want to replace are then specified as function parameters.

Syntax:
 "string {} string {} string".format(replace, replace)
Example: string format() function
message = "Hello {}, welcome to the Python {} tutorial".format("there", "string")

print(message)

In the example above we replace each instance of the open and close curly braces with a word inside the function’s parameters.

We can also reference variables by using numbers to order them in the string.

Syntax:
 "string {2} string {0} string {1}".format(replace_0, replace_1, replace_2)
Example: order replacements by number in the format() function
message = "Hello {1}, welcome to the Python {0} tutorial".format("string", "there")

print(message)

We can go a step further and insert the variable names giving us the perk of passing objects, then referencing their parameters and methods or use ** with dictionaries.

We won’t demonstrate it here, but the point is that the .format() function is definitely a step up from %-formatting.

Don't use the .format() function

The .format() function isn’t great because it is still quite verbose, specially when dealing with multiple parameters in longer strings.

f-Strings

Python 3.6 introduced us to f-Strings, or “formatted string literals”. f-Strings are string literals that have curly braces containing the expressions that will be replaced with their respective values. The expressions are formatted using the __format__ protocol.

The syntax is similar to that of the .format() function but much less verbose. An f-String requires us to prefix the string with the letter f .

Syntax:
variable_name = value

f"string {variable_name} string"
Example: f-String with variables
name = "General Kenobi"
trait = "bold"

message = f"{name}. You are a {trait} one."

print(message)

In the example above we specify the variable names that we want to replace with their corresponding values between curly braces.

f-String Expressions

Because f-Strings are evaluated at runtime, we can use any valid Python expression inside them.

Example: arithmetic inside an f-string
 print(f"3 + 4 = {3 + 4}")

We can also call functions inside of f-Strings.

Example: call a function inside an f-string
def is_instrument(instrument):
    if instrument != "Guitar" or instrument != "Piano":
        return f"No Patrick, {instrument} is not an instrument"
    else:
        return "Yes, it is"

print(f"Is Mayonaise an instrument? {is_instrument('Mayonaise')}")

We can even use objects created from classes. However, we won’t look at it here because we haven’t covered classes and objects yet and the code would be too confusing.

Multi-line f-Strings

Just as with regular strings, we can break up an f-string into multiple lines in the source code.

With f-strings we don’t need to use the backslash operator to indicate continuation, we wrap all the strings in parentheses.

Syntax:
(
    f"string line 1"
    f"string line 2"
    f"string line 3"
)
Example:
name = "Monty"
language = "Python"
version = 3
topic = "f-Strings"

message = (
    f"Hello there {name}. "
    f"Welcome to the {language} {version} "
    f"{topic} tutorial"
)

print(message)

In the example above, we wrap our f-Strings in open and close parentheses which allows us to break them up into separate lines in the source code.

Each line requires its own wrapping quotes as well as the f prefix.

Use f-Strings instead of other string formatting

f-Strings are not only the best way to format strings but also offer increased speed at runtime.

If you are working with Python 3.6 or later, there is no reason not to use f-Strings instead of the .format() function or even %-formatting.

Summary: Points to remember

  • Strings can use '' (single), "" (double) or """ (triple) quotes.
  • A string is a collection of characters that can be accessed with the indexer.
  • A string is immutable, however we can “overwrite” a string variable with a new string at runtime.
  • We can use the \ (backslash) to break strings up into multiple lines in the source code.
  • Special characters not inside triple quote strings must be escaped with a \ (backslash).
  • Don’t use %-formatting to format strings.
  • Use the .format() function to format strings only if you are working with Python 2.6 to 3.5.
  • Use f-Strings to format strings if you are working with Python 3.6 and up.