regular expression to find numbers in a string python
import re # Example with integers and floats both positive and negative and scientific notation. target_str = 'I live at 9-162 Malibeu. My phone number is +351911199911. I have 5.50 dollars with me, but I have a net income of -1.01 per day which is about -1 dollar a day with an error of +-.01. Also the earth has a mass of 5.972e24 kg or about 6e24 kg.' # Depending on what you want (p=positive, n=negative): regex_expressions = { 'p_ints' : "d+", 'pn_ints' : "[-+]?d+", 'p_floats' : "d*.d+", 'pn_floats' : "[-+]?d*.d+", 'scientific_notation':"[-+]?d+(?:.d+)?e[-+]?d+", 'pn_floats_or_ints' : "(?:[-+]?)(?:d*.d+|d+)", 'universal': "(?:[-+]?)(?:d+(?:.d+)?e[-+]?d+|d*.d+|d+)" } regex_results = dict() for target_type, regex_expression in zip (regex_expressions.keys(), regex_expressions.values()): regex_results[target_type] = re.findall(regex_expression, target_str) print(target_type,':',regex_results[target_type]) print ('nThese results are still strings, but can easily be turned into floats or ints:') for number in regex_results['universal']: print(float(number)) """ Used RegEx symbols: [] : look for any character inside the brackets d : look for any digit . : look for a dot (.) + : look for one or more occurences of the previous expression * : look for zero or more occurences of the previous expression ? : look for zero or one occurences of the previous expression (?:...) : create a non-capturing group | : look for either of the previous expressions (OR operator) Short explanation of each regex: -> positive integers: d+ look for one or more digits -> positive or negative integers: [-+]?d+ look for one or more digits, potentially preceded by a '-' or a '+' -> positive floats: d*.d+ look for zero or more digits, followed by a dot, followed by one or more digits (a lazy representation such as '.3' works in this case). Scientific notation is not allowed. -> positive or negative floats: [-+]?d*.d+] look for zero or more digits, followed by a dot, followed by one or more digits, potentially preceded by a '-' or a '+' -> scientific notation: [-+]?d+(?:.d+)?e[-+]?d+ look for any '+' or '-' signs, if they exist. Look for one or more digits, potentially followed by a dot and decimal part. Look for an 'e', followed by one or more digits -> any number not in scientific notation: (?:[-+]?)(?:d*.d+|d+) look for any '+' or '-' signs, if they exist. Look for zero or more digits, followed by a dot, followed by one or more digits (float) OR look for one or more digits (integer). -> any number: (?:[-+]?)(?:d*.d+|d+|d?e[-+]?d?) basically look for '+' or '-' and then do an OR between the previous expressions using non capturing groups. """ """ OUTPUT: p_ints : ['9', '162', '351911199911', '5', '50', '1', '01', '1', '01', '5', '972', '24', '6', '24'] pn_ints : ['9', '-162', '+351911199911', '5', '50', '-1', '01', '-1', '01', '5', '972', '24', '6', '24'] p_floats : ['5.50', '1.01', '.01', '5.972'] pn_floats : ['5.50', '-1.01', '-.01', '5.972'] scientific_notation : ['5.972e24', '6e24'] pn_floats_or_ints : ['9', '-162', '+351911199911', '5.50', '-1.01', '-1', '-.01', '5.972', '24', '6', '24'] universal : ['9', '-162', '+351911199911', '5.50', '-1.01', '-1', '-.01', '5.972e24', '6e24'] These results are still strings, but can easily be turned into floats or ints: 9.0 -162.0 351911199911.0 5.5 -1.01 -1.0 -0.01 5.972e+24 6e+24 """