Extracting numbers using regular expression

It took hours to understand how the regular expression works. I briefly explains the symbols in the codes.
. any single character
? zero or one of the previous character
+ one or more of the previous character
* zero or more of the previous character
[0-9] number characters from 0 to 9

Three floating point numbers

I have a data file that contains points in 3D space.
V = [
[ 0.000000, 0.000000, 30.000000], // 0 0-0
[10.298345, 0.000000, 28.177013], // 1 1-0
[ 3.182365, 9.794311, 28.177012], // 2 1-1
[-8.331539, 6.053200, 28.177017], // 3 1-2
[-8.331539, -6.053200, 28.177017], // 4 1-3
# …
];

I need to extract three floating point numbers for each point.
‘[-]?[0-9]+\.[0-9]+’ is the pattern that represents floating point number I’m interest.
[-]? negative sign or not
[0-9]+ at least one number
\. decimal point
[0-9]+ at least one number

The pattern skips integer numbers, 0, 1, or 1_0, because these numbers don’t have the dot character.

re.findall(pattern, string) searches the pattern in string multiple times, return list of matching substring.

s = ‘[ 0.000000, 0.000000, 30.000000], // 0 0-0’
x, y, z = re_search(s, ‘[-]?[0-9]+\.[0-9]+’, is_single=False)

The result of example code is [‘0.000000’, ‘0.000000’, ‘30.000000’]. When you creates multiple variables that have equal number of elements in a list, Python assigns each element to the variables one by one.

x, y, z = [‘0.000000’, ‘0.000000’, ‘30.000000’] does the four lines of code.

t = [‘0.000000’, ‘0.000000’, ‘30.000000’]
x = t[0]
y = t[1]
z = t[2]

The return values are number strings. Applying round(float(n), 6) over each variable, x, y, and z will have 0.000000, 0.000000, and 30.000000 numbers.

import re

def re_search(src, pattern, is_single=True):
        if is_single:
                m = re.search(pattern, src)
                return m.group()
        else:
                m = re.findall(pattern, src)
                return m

s = '[ 0.000000,  0.000000, 30.000000], //  0 0-0'
x, y, z = re_search(s, '[-]?[0-9]+\.[0-9]+', is_single=False)

Two integers and one floating point number

a, b, and c will have 0, 1, 0.
length will have 16.400000.

s = '[ 0,  1, 16.400000, "-"], //   0'
a, b, length, c = re_search(s, '[0-9]+\.?[0-9]*', is_single=False)

Three integers

a, b, c, and d will have 0, 1, 2, 0.

s = '[ 0, 1, 2], // 0' 
a, b, c, d = re_search(s, '[0-9]+\.?[0-9]*', is_single=False)

About janpenguin

Email: k2.mountain [at] gmail [dot] com Every content on the blog is made by Free and Open Source Software in GNU/Linux.
This entry was posted in Python and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s