regular expression to find numbers in a string python
import re
# Example with integers and floats both positive and negative and scientific notation.
target_str = 'I live at 9-162 Malibeu. My phone number is +351911199911. I have 5.50 dollars with me, but I have a net income of -1.01 per day which is about -1 dollar a day with an error of +-.01. Also the earth has a mass of 5.972e24 kg or about 6e24 kg.'
# Depending on what you want (p=positive, n=negative):
regex_expressions = {
'p_ints' : "d+",
'pn_ints' : "[-+]?d+",
'p_floats' : "d*.d+",
'pn_floats' : "[-+]?d*.d+",
'scientific_notation':"[-+]?d+(?:.d+)?e[-+]?d+",
'pn_floats_or_ints' : "(?:[-+]?)(?:d*.d+|d+)",
'universal': "(?:[-+]?)(?:d+(?:.d+)?e[-+]?d+|d*.d+|d+)"
}
regex_results = dict()
for target_type, regex_expression in zip (regex_expressions.keys(), regex_expressions.values()):
regex_results[target_type] = re.findall(regex_expression, target_str)
print(target_type,':',regex_results[target_type])
print ('nThese results are still strings, but can easily be turned into floats or ints:')
for number in regex_results['universal']:
print(float(number))
"""
Used RegEx symbols:
[] : look for any character inside the brackets
d : look for any digit
. : look for a dot (.)
+ : look for one or more occurences of the previous expression
* : look for zero or more occurences of the previous expression
? : look for zero or one occurences of the previous expression
(?:...) : create a non-capturing group
| : look for either of the previous expressions (OR operator)
Short explanation of each regex:
-> positive integers: d+
look for one or more digits
-> positive or negative integers: [-+]?d+
look for one or more digits, potentially preceded by a '-' or a '+'
-> positive floats: d*.d+
look for zero or more digits, followed by a dot, followed by one or more digits (a lazy representation such as '.3' works in this case). Scientific notation is not allowed.
-> positive or negative floats: [-+]?d*.d+]
look for zero or more digits, followed by a dot, followed by one or more digits, potentially preceded by a '-' or a '+'
-> scientific notation: [-+]?d+(?:.d+)?e[-+]?d+
look for any '+' or '-' signs, if they exist. Look for one or more digits, potentially followed by a dot and decimal part. Look for an 'e', followed by one or more digits
-> any number not in scientific notation: (?:[-+]?)(?:d*.d+|d+)
look for any '+' or '-' signs, if they exist. Look for zero or more digits, followed by a dot, followed by one or more digits (float) OR look for one or more digits (integer).
-> any number: (?:[-+]?)(?:d*.d+|d+|d?e[-+]?d?)
basically look for '+' or '-' and then do an OR between the previous expressions using non capturing groups.
"""
"""
OUTPUT:
p_ints : ['9', '162', '351911199911', '5', '50', '1', '01', '1', '01', '5', '972', '24', '6', '24']
pn_ints : ['9', '-162', '+351911199911', '5', '50', '-1', '01', '-1', '01', '5', '972', '24', '6', '24']
p_floats : ['5.50', '1.01', '.01', '5.972']
pn_floats : ['5.50', '-1.01', '-.01', '5.972']
scientific_notation : ['5.972e24', '6e24']
pn_floats_or_ints : ['9', '-162', '+351911199911', '5.50', '-1.01', '-1', '-.01', '5.972', '24', '6', '24']
universal : ['9', '-162', '+351911199911', '5.50', '-1.01', '-1', '-.01', '5.972e24', '6e24']
These results are still strings, but can easily be turned into floats or ints:
9.0
-162.0
351911199911.0
5.5
-1.01
-1.0
-0.01
5.972e+24
6e+24
"""