python - extract info from a string -

- April 15, 2010

the following code working, not able extract information need. can use soup or need regular expression?

from bs4 import beautifulsoup import urllib2 mynumber='1234567890' url="http://www.nccptrai.gov.in/nccpregistry/savesearchsub.misc?phoneno="+mynumber page=urllib2.urlopen(url) soup = beautifulsoup(page.read())     table = soup.findall("table")[1] myl=[item.text.strip() item in table.find_all('td')]  import re re.findall(r'is:\s*[^,]*' , myl[1])

the expected output 4 parameters mentioned in first string of first slice.

['2014-08-07 15:50:00', 'andhra pradesh', 'unitech', '0']

(note date changed y-m-d)

the string returned looks this...

[u'is:\n 31-10-2009 01:11\n\n\nservice area : \n mumbai\n\n\nservice provider :\n idea\n\n\n\n\n\nyour preference :0']

i'd rely on the number registered in ncpr header (it in td tag class gridheader) , next rows via find_next_siblings():

import urllib2 bs4 import beautifulsoup  mynumber = '1234567890' url = "http://www.nccptrai.gov.in/nccpregistry/savesearchsub.misc?phoneno=" + mynumber  soup = beautifulsoup(urllib2.urlopen(url))  header = soup.find('td', class_='gridheader')  result = [] row in header.parent.find_next_siblings('tr'):     cells = row.find_all('td')     try:         result.append(cells[2].get_text(strip=true))     except indexerror:         continue print result

prints:

[u'07-08-2014 15:50', u'andhra pradesh', u'unitech', u'0']

Search This Blog

GCM

python - extract info from a string -

Comments

Post a Comment

Popular posts from this blog

matlab - "Contour not rendered for non-finite ZData" -

android - Hide only the Action bar on Scroll not action bar tabs -

delphi - Indy UDP Read Contents of Adata -