HomeArchitectureMoving up the stack - time to learn Python - Part 4

Moving up the stack – time to learn Python – Part 4

In our last article on my journey into Python, we talked at depth about the various operators that Python can use.  If you managed to plough your way through that post, my hat off to you.

To refresh your memory on the first three articles, you can read them at the links shown below:

Moving up the stack – time to learn Python – part 1

Moving up the stack – time to learn Python – part 2

Moving up the stack – time to learn Python – part 3

Today we will continue talking about the building blocks and look at how Python handles String formatting.

Why are we interested in strings when using Python?

Strings are important because they are one of the most common and versatile data types in programming. Strings can store text, numbers, symbols, or any combination of them. Strings can be manipulated, compared, searched, formatted, and encoded in various ways. Strings can also be used to communicate with other programs, databases, web servers, or users. Strings are essential for creating user interfaces, web pages, documents, reports, and many other applications that involve text processing.

String manipulation is perhaps one of the most important abilities a programming language needs to have, in fact you could say that it is vital especially when analysing data. Python as a language provides a many of built-in string manipulation functions.  In this post, we’ll look at the most used Python string methods for string manipulation.

What is a string?

A python string is a list of characters in an order. A character is anything you can type on the keyboard in one keystroke, like a letter, a number, or a backslash. Strings can also have spaces, tabs, and newline characters. For instance, at its most simple a string is any text, number symbol or any combination, enclosed in a set of single or double quotes.   In our first article were introduced to the string with this very simple script.

Print(“hello world”)

So that is great, but there must be more, right. 😉

String Manipulation in Python.

The zen of python, states that there is one way to do something in Python.  Well as we know for ever rule there is an exception, in this case there are actually four exceptions, that’s right Python has four methods of dealing with String manipulation.  For the rest of this article we will investigate these methods.  The first method we will look at is commonly known as C-style formatting or “Simple positional formatting” and You will be pleased to know it is very similar to that of Go in fact the official python documentation refers to it as “Printf style String Formating , so if you are also following our series on GoLang, you will have a headstart here.

Printf style string formatting

If you have been following our Series on GoLang you will most likely recognise the table below.  The format for use is to predicate the desired letter with a “%” symbol.

Conversion Meaning
‘d’ Signed integer decimal.
‘i’ Signed integer decimal.
‘o’ Signed octal value.
‘u’ Obsolete type – it is identical to ‘d’.
‘x’ Signed hexadecimal (lowercase).
‘X’ Signed hexadecimal (uppercase).
‘e’ Floating point exponential format (lowercase).
‘E’ Floating point exponential format (uppercase).
‘f’ Floating point decimal format.
‘F’ Floating point decimal format.
‘g’ Floating point format. Uses lowercase exponential format if exponent is less than -4 or not less than precision, decimal format otherwise.
‘G’ Floating point format. Uses uppercase exponential format if exponent is less than -4 or not less than precision, decimal format otherwise.
‘c’ Single character (accepts integer or single character string).
‘r’ String (converts any Python object using repr()).
‘s’ String (converts any Python object using str()).
‘a’ String (converts any Python object using ascii()).
‘%’ No argument is converted, results in a ‘%’ character in the result.

In Python the two “%” operators you will be interact the most with are the “%d” and “%s” these operators relate to integers in relation to “%d” and strings in relation to “%s”

A more detailed overview

As stated “%s” is used as a placeholder to specify a string insertion. It allows us to place a string the is usually declared as a variable, within another strings or data stream. The “%s” operator is put where the string is to be specified and uses the inbuilt function “str()” to carry out the task and the operator “%d” is used as a placeholder to specify integer values, decimals, or numbers. It allows us to print numbers within strings or other values. The “%d” operator is put where the integer is to be specified.

Here’s an example:

name = 'Tom'
age = 58
print('My name is %s and I am %d years old.' % (name, age))

This will output:

myNameIS Python

The above example also introduces the “%” character, in the singular for it is used as a string formatting operator. It allows you to format strings by specifying placeholders for variables that will be replaced with their values. The relevant full % operator is put where the variable is to be specified.

Now one of the major problems with “%d” is that Floating-point numbers are converted automatically to decimal values.

number = 4.6
print("This number has been changed to the integer %d rather than being inserted correctly as %f" % (int(number), number))

when we run that particular piece of code we receive the following response

Python - manipulating numbers

You will have noticed that I snuck inn the %f operator this is one keeps floating point numbers as numbers.  This along with its cousin the %F operator, this one will is use with exponents.  Next we will look at the %X or %x operators, this is to

number = 1234567890.1234567890
print("The number is %f and the number is %F" % (number, number))

This will output:

Floating point numbers - python

As you can see, %f operator formats the floating-point number with six decimal places while %F operator formats the floating-point number with six decimal places and uppercase E notation for exponents.  But what if we want to display a Hexidecimal number, well for this we use the %x or %X operator

number = 1234567890
print("The number %d is %X in hexadecimal" % (number, number))

This will output:

Convert to Hex = Python

What if you want to pay with Octal? (Base 8 rather that base 10).Well in this case you will use %o

number = 12345677890
Print(“The number is %f and %o in octal” % (number, number))

This will output

Convert to Octal = Python

Now that we have run though the c-style formatting, you may well be alarmed to hear that this is called the old style string formatting, yes that is unfortunately the case, Python introduced a new way to do string formatting. You may be asking well why do I need to even learn the old style of formatting, the simple answer is legacy code and the fact that it is still supported and coders are generally a lazy lot,  if it works they will use it, until they cannot any more.  OK so lets move on and start looking at the “new” style of formatting a string.

New Style String formatting

This style of formatting was introduced to python in version 3.6 so there is a lot of “legacy” code out there.  That said the new style formatting was backported to version 2.7.  so if you are involved in any code rework, consider removing the c-style formatting from your code as it is legacy.  The built in function “format()” can be used to do simple positional formatting, similar like “old style” formatting,  The below example shows the new style formatting in use.

Name = “Tom”
Print (“Greeting and Salutations,{}”.format(name))

The output of this script is:

new style string manipulation - Python

Alternatively, you can call your variable substitutions by name and utilise them in any order you wish. This is a very useful feature because it allows you to rearrange the display order without modifying the arguments supplied to format():

Name = “Tom”
Errno = 0xbadc0ffee
Print (“Hey {name}, there is a 0x{errno:x} error!” .format(name=name, errno=errno))

Running this script well return a response similar to that shown below:

it an error message - Python

You may have noticed that there is a “x” in the block “{errno:x}” Yes, the syntax for converting an int variable to a hexadecimal string has changed. You must now pass a format specification by appending an a:x suffix. The format string syntax has grown in power while remaining easy in application. Reading up on this string formatting mini-language in the Python docs is worthwhile.

This “new style” string formatting is favoured over %-style formatting in Python 3. While “old style” formatting has been downplayed, it has not been phased out. It is still supported in the most recent Python versions.

Even though there is currently no plans to deprecate the “old style” formatting, personally I will not be using it moving forward as  the official Python 3 documentation neither recommends nor praises “old style” formatting:

“The formatting operations described here have a number of quirks that lead to a number of common errors (such as incorrectly displaying tuples and dictionaries).” These problems can be avoided by using updated formatted string literals or the str.format() interface. These alternatives also offer more powerful, adaptable, and extensible approaches to text formatting.”

The third method of String manipulation is called String Interpolation / f-strings.  Please note this is f-strings not f-bombs, which is something altogether different and nothing to do with Python, well may be it has something to do with python but only when your code doesn’t run as expected.

String Interpolation / f-strings

The release of Python 3.6 added a new string formatting approach called formatted string literals or “f-strings”.  This new way of formatting strings lets you use embedded Python expressions inside string constants. Here’s a simple example to give you a feel for the feature:

name = “Tom”
print( f”Hello, {name}”)

running this will result in the following output.

Hello Tom - Python

As can be seen, the string is prefixed with with the letter “f“—hence the name “f-strings.” This new formatting syntax is very powerful. Why?  Well, you can embed arbitrary expressions, you can even do inline arithmetic with it. Check out this example:

a = 20
b = 30
print(f”twenty plus thirty is {a + b} and not {2 * (a + b)}.')

Results in the following response:

nifty addition - python

Using Formatted string literals as your Python parser feature can convert f-strings into a series of string constants and expressions. They then get joined up to build the final string.

Consider the simple function which we will call “greet()”, this function contains an f-string:

def greet(name, question):
return f"Hello, {name}! How's it {question}?"
print(greet('Tom', 'Hanging'))

This script takes two arguments (name and question) and returns a formatted string. The last line of the script calls the “greet()” function with the arguments “Tom” and “hanging”, and prints the resulting string to the console.

how it hanging? - Python

String literals also support the existing format string syntax of the str.format() method. So lets revisit the scripts we created and transform them using a f-string.

Name = “Tom”
Errno = 0xbadc0ffee
Print(f”Hey {name}, there’s a {errno:x} error!”)

Re-running the script shows that the response is as expected the same.

it an error message - Python

The final method of string manipulation is using template strings, this is a build in module from the standard python library of modules.

Template Strings (Standard Library)

The final tool for string formatting in Python is the template string. It’s a simpler, but less powerful mechanism.

Let’s take a look at our simple greeting “hey, Tom! And rewrite it using Template strings:

from string import Template
name = “Tom”
t = Template('Hey, $name!')
print(t.substitute(name=name))

the output should be similar to that shown below.

hey Tom! - Python

You can see that we need to import the Template class from Python’s built-in string module. This is because Template strings is not a core language feature; it is supplied by the string module in the standard library.  If you recall I mentioned that using Template string is simple but not very powerful, this is because template strings don’t allow format specifiers. If we want to rewrite our “errno” script to use a Template string we would need to manually convert the error number into a hex-string “error=hex(errno)”:

From string import Template
Name = “Tom”
Errno = 0xbadc0ffee
templ_string = 'Hey $name, there is a $error error!'
print(Template(templ_string).substitute(name=name, error=hex(errno)))

Running this script will now show the error correctly

it an error message - Python

That worked great.

OK why would I use a Template String when it is underpowered compared to the other methods?

Consider this script:

SECRET = 'this-is-my super-duper secret secret'
class Error:
    def __init__(self):
        pass

user_input = '{error.__init__.__globals__[SECRET]}'
err = Error()
print(user_input.format(error=err))

What we are going to discuss now is a little more advanced, but I feel that it is important.  Python as a language is fairly ubiquitous.  It is also easy to create code that is vulnerable to attack,

The code above creates the variable “SECRET” and gives it the value “This is my super-duper secret secret.”  We then defined a class called “Error” with what is termed the constructor function “__init__”  this function is empty.

The format string “error.init.globals[SECRET]” is created as a string variable named user_input in the next line. The format string’s curly braces () serve as placeholders for values that will later be filled in. The Error class object is created and assigned to the variable err in the next line.

We used the format() method on the user_input string in the final line, which invoked the print() function with the user_input string as an argument. The curly brackets in the user_input string are replaced by values from the err object’s attributes by the format() method. It specifically searches the globals dictionary of the init method of the err object for the value of the SECRET variable.  This is BAD.

This is BAD - Python

This piece of code illustrated how arbitrary code can be executed using Python’s string formatting. This method poses a security risk, hence it is not advised to use it in production code.

Let’s revisit that same code but use Template strings to close this attack vector.  We simply replace the user_input line and the last line in the code with

user_input = '${error.__init__.__globals__[SECRET]}'
Print(Template(user_input).substitute(error=err))

Now when we run the code we get an error.

fixed the vulnerability - Python

So how do we choose our String Formatting method? They all have pros and cons.

with all this choice when playing arround with string manipulation, how do we choose the best option for the task at hand,  as you have seen they all have their pros and cons.  to sum up I have created this little flow diagram to help you down the logic path:

logic flow chart - string manipulation - Python

A nice little Rule of Thumb to aid you in choice: If your format strings are user-supplied, use Template Strings from the Standard library to avoid security issues. Otherwise, use Literal String Interpolation / f-strings if you’re on Python 3.6+, and “New Style” str.format if you’re not, Do not use the % operator.  It you are still confused don’t worry as we will be getting a lot of experience with String manipulation.

Summary

We have only scrapped the service of what you can do with string manipulation in Python.  Our knowledge will only increase as we use the program in anger.  In our next article we will delve deeper into the mystic art of string manipulation.

NEWSLETTER

Receive our top stories directly in your inbox!

Sign up for our Newsletters

LET'S CONNECT