என்னால் இந்த dataவை மட்டும் scrap செய்து எடுக்க முடியவில்லை,ஆனால் மற்ற columns எல்லாம் எடுக்க முடிகிறது.எடுக்க முடியாத data போட்டோவை பதிவு செய்கிறேன்

rizwan_tk · December 14, 2024, 2:01am

import requests
from bs4 import BeautifulSoup
import pandas as pd
from googletrans import Translator

Initialize the translator

translator = Translator()

List of target URLs

urls = [
“تفاصيل المنافسة”
]

Initialize an empty list to hold all data

all_data =
columns = # To hold the column headers (keys)

Loop through each URL

for url in urls:
try:
# Send GET request to fetch the HTML content
response = requests.get(url)

    # Check if the request was successful
    if response.status_code == 200:
        # Parse the HTML content using BeautifulSoup
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Initialize a dictionary to hold key-value pairs for this URL
        url_data = {}
        
        # Find all items with 'list-group-item' class
        items = soup.find_all('li', class_='list-group-item')
        for item in items:
            # Extract the title (key) and info (value)
            title_element = item.find('div', class_='etd-item-title')
            info_element = item.find('div', class_='etd-item-info')
            
            if title_element and info_element:
                # Extract the text
                title_ar = title_element.text.strip()
                info_ar = ' '.join(span.text.strip() for span in info_element.find_all('span') if span.text.strip())
                
                # Translate Arabic to English
                title_en = translator.translate(title_ar, src='ar', dest='en').text
                info_en = translator.translate(info_ar, src='ar', dest='en').text
                
                # Add to dictionary (key as title, value as info)
                url_data[title_en] = info_en
        
        # Append the dictionary of key-value pairs for this URL
        all_data.append(url_data)

        # Merge the column names (titles) for the first time
        if not columns:
            columns = list(url_data.keys())
    
    else:
        print(f"Failed to fetch the page. Status code: {response.status_code} for {url}")

except requests.exceptions.RequestException as e:
    print(f"Error fetching the page: {e}")

Convert the list of dictionaries to a DataFrame

df = pd.DataFrame(all_data, columns=columns)

Save the DataFrame to an Excel file

output_file = ‘key_value_data_with_titles_as_columns.xlsx’
df.to_excel(output_file, index=False, engine=‘openpyxl’)
print(f"Data saved to {output_file}")

tshrinivasan · December 14, 2024, 1:44pm

உங்கள் கேள்வி சரியாகப் புரியவில்லை.

படத்தில் உள்ள எந்த தகவலை scrap செய்ய முடியவில்லை?
அதன் URL பகிர்க.
அதற்கான HTML code ஐப் பார்த்தீர்களா? அதன் xpath ஐயும் பகிர்க.

rizwan_tk · December 14, 2024, 4:06pm

ஐயா,
Portal link
“https://tenders.etimad.sa/Tender/DetailsForVisitor?STenderId=SHU99UPx2RfATJ8YhrqYYw==”

rizwan_tk · December 14, 2024, 4:17pm

ஐயா,நான் சமீபத்தில் தான் python programming- யை உங்களது வகுப்பில் கற்று கொண்டேன்.எனக்கு தெரிந்த அளவில் சில வீடியோக்கள்/chat gpt உபயோகித்து ஒரு அளவு dataவை மட்டுமே என்னால் எடுக்க முடிந்து ஐயா.மேற்படி,xpath ,html எனக்கு மிகவும் புதிது ஐயா. தயவு செய்து நீங்கள் இதை கற்று தர வேண்டும்.நான் கற்று கொள்ள ஆவலாக இருக்கிறேன் ஐயா.

Code:

import requests
from bs4 import BeautifulSoup
import pandas as pd
from googletrans import Translator

# Initialize the translator
translator = Translator()

# List of target URLs
urls = [
"https://tenders.etimad.sa/Tender/DetailsForVisitor?STenderId=SHU99UPx2RfATJ8YhrqYYw=="
]

# Initialize an empty list to hold all data
all_data = []
columns = []  # To hold the column headers (keys)

# Loop through each URL
for url in urls:
    try:
        # Send GET request to fetch the HTML content
        response = requests.get(url)

        # Check if the request was successful
        if response.status_code == 200:
            # Parse the HTML content using BeautifulSoup
            soup = BeautifulSoup(response.text, 'html.parser')
            
            # Initialize a dictionary to hold key-value pairs for this URL
            url_data = {}
            
            # Find all items with 'list-group-item' class
            items = soup.find_all('li', class_='list-group-item')
            for item in items:
                # Extract the title (key) and info (value)
                title_element = item.find('div', class_='etd-item-title')
                info_element = item.find('div', class_='etd-item-info')
                
                if title_element and info_element:
                    # Extract the text
                    title_ar = title_element.text.strip()
                    info_ar = ' '.join(span.text.strip() for span in info_element.find_all('span') if span.text.strip())
                    
                    # Translate Arabic to English
                    title_en = translator.translate(title_ar, src='ar', dest='en').text
                    info_en = translator.translate(info_ar, src='ar', dest='en').text
                    
                    # Add to dictionary (key as title, value as info)
                    url_data[title_en] = info_en
            
            # Append the dictionary of key-value pairs for this URL
            all_data.append(url_data)

            # Merge the column names (titles) for the first time
            if not columns:
                columns = list(url_data.keys())
        
        else:
            print(f"Failed to fetch the page. Status code: {response.status_code} for {url}")
    
    except requests.exceptions.RequestException as e:
        print(f"Error fetching the page: {e}")

# Convert the list of dictionaries to a DataFrame
df = pd.DataFrame(all_data, columns=columns)

# Save the DataFrame to an Excel file
output_file = 'key_value_data_with_titles_as_columns.xlsx'
df.to_excel(output_file, index=False, engine='openpyxl')
print(f"Data saved to {output_file}")

tshrinivasan · December 15, 2024, 1:56pm

நிரலாக்கம் என்பது பிரியாணி போல. பல்வேறு கூறுகளைக் கொண்டது.

காணொளி பார்த்து பிரியாணி செய்தாலும், சமையலின் அடிப்படைகளை தெரிந்து பிறகு பிரியாணி செய்தாலே நல்ல சமையலாக இருக்கும்.

எனவே, AI, stackoverflow, github ல் இருந்து நிரலை அப்படியே எடுத்துப் பயன்படுத்தாமல், அவற்றின் அடிப்படைகளைக் கற்று பின் நீங்களே எழுதிப் பழகுங்கள்.

html, xpath, css, selenium , beautifulsoup, mechanize, request ஆகியன பற்றி படித்துப் பழக வேண்டுகிறேன்.

rizwan_tk · December 15, 2024, 3:10pm

நிச்சயம் ஐயா, நன்றி