python - extract info from a string -


the following code working, not able extract information need. can use soup or need regular expression?

from bs4 import beautifulsoup import urllib2 mynumber='1234567890' url="http://www.nccptrai.gov.in/nccpregistry/savesearchsub.misc?phoneno="+mynumber page=urllib2.urlopen(url) soup = beautifulsoup(page.read())     table = soup.findall("table")[1] myl=[item.text.strip() item in table.find_all('td')]  import re re.findall(r'is:\s*[^,]*' , myl[1]) 

the expected output 4 parameters mentioned in first string of first slice.

['2014-08-07 15:50:00', 'andhra pradesh', 'unitech', '0'] 

(note date changed y-m-d)

the string returned looks this...

[u'is:\n 31-10-2009 01:11\n\n\nservice area : \n mumbai\n\n\nservice provider :\n idea\n\n\n\n\n\nyour preference :0'] 

i'd rely on the number registered in ncpr header (it in td tag class gridheader) , next rows via find_next_siblings():

import urllib2 bs4 import beautifulsoup  mynumber = '1234567890' url = "http://www.nccptrai.gov.in/nccpregistry/savesearchsub.misc?phoneno=" + mynumber  soup = beautifulsoup(urllib2.urlopen(url))  header = soup.find('td', class_='gridheader')  result = [] row in header.parent.find_next_siblings('tr'):     cells = row.find_all('td')     try:         result.append(cells[2].get_text(strip=true))     except indexerror:         continue print result 

prints:

[u'07-08-2014 15:50', u'andhra pradesh', u'unitech', u'0'] 

Comments

Popular posts from this blog

matlab - "Contour not rendered for non-finite ZData" -

delphi - Indy UDP Read Contents of Adata -

javascript - Any ideas when Firefox is likely to implement lengthAdjust and textLength? -