You are here: New Formats for 2017 > HTML表的读者

HTML表的读者

Licensing optionsfor this format begin with FME Professional Edition.

The HTML Table Reader provides FME with the ability to read table and list data from HTML documents.

Overview

HTML (Hypertext Markup Language) is used on the internet to format documents for display in web browsers.While the primary purpose is not to store data for machine-readability,table and list elements often contain useful data.While HTML is XML-based,it is not compatible with strict XML parsing.As a further complication,due to the lenient parsing methods used in web browsers,an HTML document does not have to follow the HTML specification fully to display reasonably well.

The HTML Table Reader lists all of the table and list (ulandol) elements in the HTML document and allows you to select which tables or lists to read.Note that the feature type names for the tables and lists are determined based on theTable Name Fromreader parameter.

Attribute names are determined from table header if reading an HTML table that contains a header.For lists,or tables without a header row,attribute names will be generated.HTML tables without a header row will have attributesCol1throughColN,while columns containing row headers,but not having a column heading will be namedRowHeading1throughRowHeadingN,whereNin both cases is the number of columns.Attribute types in both tables and lists are determined by scanning the data rows.

Features are produced for every row of a table when reading and HTML table.A single feature is produced for each HTML list where the list contents are output be stored in a single attribute calledhtml_list_content.

HTML File Extensions

By convention,HTML files have the extension. htmor.html.However,web URLs will often have no file extension,or reflect the source script used to generate the HTML output,such as.phpor.asp.

Note that URLs that generate HTML pages are valid datasets provided the request to the URL returns valid HTML.The HTML Table Reader allows any file extension when reading from disk.

Reader Overview

The HTML Table Reader parses features from the document.

Schema Scanning

Since the values in an HTML table do not have associated schema,FME scans the table to determine reasonable data types for each attribute.In the case of lists,or tables without a header row,generic attribute names will be generated.

Workbench Reader Dataset

The value for the Reader Dataset is the path or URL to an HTML document.