Answers for "how to parse robot.txt using requests"

0

how to parse robot.txt using requests

import requests

result_data_set = {"Disallowed":[], "Allowed":[]}

r = requests.get(url + "/robots.txt").text 

for line in result.split("\n"):
    if line.startswith('Allow'):    # this is for allowed url
        result_data_set["Allowed"].append(line.split(': ')[1].split(' ')[0])    # to neglect the comments or other junk info
    elif line.startswith('Disallow'):    # this is for disallowed url
        result_data_set["Disallowed"].append(line.split(': ')[1].split(' ')[0])    # to neglect the comments or other junk info

print (result_data_set)
Posted by: Guest on April-02-2022

Code answers related to "how to parse robot.txt using requests"

Browse Popular Code Answers by Language