how to parse robot.txt using requests

import requests

result_data_set = {"Disallowed":[], "Allowed":[]}

r = requests.get(url + "/robots.txt").text 

for line in result.split("\n"):
    if line.startswith('Allow'):    # this is for allowed url
        result_data_set["Allowed"].append(line.split(': ')[1].split(' ')[0])    # to neglect the comments or other junk info
    elif line.startswith('Disallow'):    # this is for disallowed url
        result_data_set["Disallowed"].append(line.split(': ')[1].split(' ')[0])    # to neglect the comments or other junk info

print (result_data_set)

Posted by: Guest on April-02-2022

Code answers related to "how to parse robot.txt using requests"

Code answers related to "Whatever"

Browse Popular Code Answers by Language

Answers for "how to parse robot.txt using requests"

Code answers related to "how to parse robot.txt using requests"

Code answers related to "Whatever"

Browse Popular Code Answers by Language

Popular Programming Languages

Advertisements

Company

Compilers

Help

Connect with us