Answers for "how to extract numeric values from a string in python regex"

1

regular expression to find numbers in a string python

import re

# Example with integers and floats both positive and negative and scientific notation.
target_str = 'I live at 9-162 Malibeu. My phone number is +351911199911. I have 5.50 dollars with me, but I have a net income of -1.01 per day which is about -1 dollar a day with an error of +-.01. Also the earth has a mass of 5.972e24 kg or about 6e24 kg.'
# Depending on what you want (p=positive, n=negative):
regex_expressions = {
    'p_ints' :            "d+",
    'pn_ints' :           "[-+]?d+",
    'p_floats' :          "d*.d+",
    'pn_floats' :         "[-+]?d*.d+",
    'scientific_notation':"[-+]?d+(?:.d+)?e[-+]?d+",
    'pn_floats_or_ints' : "(?:[-+]?)(?:d*.d+|d+)",
    'universal':          "(?:[-+]?)(?:d+(?:.d+)?e[-+]?d+|d*.d+|d+)"
}

regex_results = dict()

for target_type, regex_expression in zip (regex_expressions.keys(), regex_expressions.values()):
    regex_results[target_type] = re.findall(regex_expression, target_str)
    print(target_type,':',regex_results[target_type])

print ('nThese results are still strings, but can easily be turned into floats or ints:')
for number in regex_results['universal']:
    print(float(number))

"""
Used RegEx symbols:
    [] : look for any character inside the brackets
    d : look for any digit
    . : look for a dot (.)
    + : look for one or more occurences of the previous expression
    * : look for zero or more occurences of the previous expression
    ? : look for zero or one occurences of the previous expression
    (?:...) : create a non-capturing group
    | : look for either of the previous expressions (OR operator)
    

Short explanation of each regex:
    -> positive integers: d+
        look for one or more digits
    -> positive or negative integers: [-+]?d+
        look for one or more digits, potentially preceded by a '-' or a '+'
    -> positive floats: d*.d+
        look for zero or more digits, followed by a dot, followed by one or more digits (a lazy representation such as '.3' works in this case). Scientific notation is not allowed.
    -> positive or negative floats: [-+]?d*.d+]
        look for zero or more digits, followed by a dot, followed by one or more digits, potentially preceded by a '-' or a '+'
    -> scientific notation: [-+]?d+(?:.d+)?e[-+]?d+
        look for any '+' or '-' signs, if they exist. Look for one or more digits, potentially followed by a dot and decimal part. Look for an 'e', followed by one or more digits
    -> any number not in scientific notation: (?:[-+]?)(?:d*.d+|d+)
        look for any '+' or '-' signs, if they exist. Look for zero or more digits, followed by a dot, followed by one or more digits (float) OR look for one or more digits (integer).
    -> any number: (?:[-+]?)(?:d*.d+|d+|d?e[-+]?d?)
        basically look for '+' or '-' and then do an OR between the previous expressions using non capturing groups.
"""

"""
OUTPUT:
    p_ints : ['9', '162', '351911199911', '5', '50', '1', '01', '1', '01', '5', '972', '24', '6', '24']
    pn_ints : ['9', '-162', '+351911199911', '5', '50', '-1', '01', '-1', '01', '5', '972', '24', '6', '24']
    p_floats : ['5.50', '1.01', '.01', '5.972']
    pn_floats : ['5.50', '-1.01', '-.01', '5.972']
    scientific_notation : ['5.972e24', '6e24']
    pn_floats_or_ints : ['9', '-162', '+351911199911', '5.50', '-1.01', '-1', '-.01', '5.972', '24', '6', '24']
    universal : ['9', '-162', '+351911199911', '5.50', '-1.01', '-1', '-.01', '5.972e24', '6e24']
    
    These results are still strings, but can easily be turned into floats or ints:
    9.0
    -162.0
    351911199911.0
    5.5
    -1.01
    -1.0
    -0.01
    5.972e+24
    6e+24
"""
Posted by: Guest on August-08-2021

Code answers related to "how to extract numeric values from a string in python regex"

Python Answers by Framework

Browse Popular Code Answers by Language