python - extract info from a string -
the following code working, not able extract information need. can use soup or need regular expression?
from bs4 import beautifulsoup import urllib2 mynumber='1234567890' url="http://www.nccptrai.gov.in/nccpregistry/savesearchsub.misc?phoneno="+mynumber page=urllib2.urlopen(url) soup = beautifulsoup(page.read()) table = soup.findall("table")[1] myl=[item.text.strip() item in table.find_all('td')] import re re.findall(r'is:\s*[^,]*' , myl[1])
the expected output 4 parameters mentioned in first string of first slice.
['2014-08-07 15:50:00', 'andhra pradesh', 'unitech', '0']
(note date changed y-m-d)
the string returned looks this...
[u'is:\n 31-10-2009 01:11\n\n\nservice area : \n mumbai\n\n\nservice provider :\n idea\n\n\n\n\n\nyour preference :0']
i'd rely on the number registered in ncpr
header (it in td
tag class gridheader
) , next rows via find_next_siblings()
:
import urllib2 bs4 import beautifulsoup mynumber = '1234567890' url = "http://www.nccptrai.gov.in/nccpregistry/savesearchsub.misc?phoneno=" + mynumber soup = beautifulsoup(urllib2.urlopen(url)) header = soup.find('td', class_='gridheader') result = [] row in header.parent.find_next_siblings('tr'): cells = row.find_all('td') try: result.append(cells[2].get_text(strip=true)) except indexerror: continue print result
prints:
[u'07-08-2014 15:50', u'andhra pradesh', u'unitech', u'0']
Comments
Post a Comment