Report:PatBase/Viewing Results/Analyzing Results
From Intellogist
| Report | Patent Coverage Map | Ratings | Comments |
| This search system report was created by the Intellogist Team and is available for viewing only. If you'd like to share your knowledge on Intellogist, please visit the Best Practices, Glossary, or Community Reports pages. Registered users may be notified of any substantial changes to this report by placing a "watch" on the Revisions page, which is the last page listed in the table of contents. To learn more about using the Intellogist "watchlist," see the Watchlist Help page. | |
![]() ![]() |
|
Analyzing Results
Like other search systems, PatBase offers a few graphical analysis and display options for results sets. Two of these features are available for results in a saved folder – the Visual Explorer and the View Snapshot options, which are accessible in the folder's hit list menu of options. (Data from saved folders can be also easily placed into the search history page via the "add to history" option; see Saving Results for more information).
The simple analysis features, which include the Snapshot feature and basic statistical analysis tools, appear to be meant for search planning purposes, to identify classification areas or assignees relevant to a particular subject matter and to pinpoint the best candidates to include in a search strategy. The Visual Explorer Tool (described at the end of this section) also seems best suited for search planning purposes, for identifying relavent keywords, applicants, and IPC classifications to add to the query. The keyword analysis and advanced statistical analysis modules may be more useful for in-depth analysis projects.
The Snapshot Feature
In 2009, Minesoft added the "Snapshot" feature to both its PatBase and PatBase Express products to provide a quick overview of a search results set. New fields were added to the Snapshot tool in October 2011, including a citations timeline and a keywords visualization wheel.[1] This feature is accessible directly from the "more..." menu on the search history page. Links to this feature are also available from the statistical analysis dialog screens.
Only the first 2500 results in the data set are used to produce the Snapshot. When this option is chosen, a graphical overview is presented of five properties of the selected data set: IPC classes, applicant names, countries of publication, US classes, and citations. A "visualize" option, which will produce a keyword/IPC/applicant visualization wheel (using the Visual Explorer Tool) is also available in the Snapshots window. In addition, a breakdown of the data set by publication year is always shown in the top panel of the Snapshot (except for the visualize option).
Each graph within the Snapshot can be converted to a pie chart, and users have the option to save each graph as a separate image file. Options on the left menu of the snapshot page include the ability to save the output as a PDF document, save the chart data as an XLS spreadsheet, or print the page. (Note that the XLS spreadsheet will contain the data necessary to reproduce the chart in Excel, e.g. number of documents published per year, rather than any individual patent bibliographic data.)
The Snapshot is an interactive display. The graph of publication year is static, but it can be used to filter the graph in the lower panel by limiting the display to a particular decade or year for further analysis. Furthermore, in the dynamic lower panel, users may select any element of the bar or pie chart to see a further breakdown of the selected IPC code, applicant, country, or US Class (according to the entire data set).
In the example below, a chart of the US classes in the data set was limited to those US classes on documents published in the 1990s by selecting the bar labeled "1990s" in the upper portion of the graph. The display was then further limited to only those US classes related to class 514 by selecting the "drug, bio-affecting and body treating compositions" class, 514, from the list of US classes in the graph in the lower panel.
The resulting display is a chart showing the frequency of occurrence of the "drug, bio-affecting and body treating compositions" classes on all documents in the record set (note that the Snapshot feature only analyzes the first 2500 documents in the results set being analyzed). The total number of occurrences corresponding to class 514 in the first graph (which only looks at documents published in the 1990s) is 69, but the total number of occurrences listed in the second graph that looks at the breakdown of classification 514 appears to be around 490 occurrences. Therefore, the user can conclude that the second graph is looking at the breakdown of class 514 using all occurrences of that class in the dataset, instead of only class occurrences on records from the 1990s. Note that a return arrow icon is now present on the graph in the lower panel; it can be used to return to the second step of the analysis, the breakdown of all US classes in the 1990s.
Editor's Note:It was a good idea for the Minesoft developers to include links to this quick Snapshot overview from the initial menus for the statistical analysis options (which are discussed further below). Because the snapshot is generated so quickly, it is possible that users who just need a quick overview can utilize the Snapshot function rather than the more computationally-intensive analysis options, saving the user's time, and reducing the strain on Minesoft's servers.
One quirk of the Snapshot feature, however, is that the tool will allow the user to drill down to particular decades or years, but the user can't drill down further within the classification codes in that time period. Instead, if the user wants to drill deeper into the classification occurrences within the decade/year bar graph, the snapshot tool will produce a more detailed breakdown of classification occurrences that applies to the entire dataset, instead of applying to only the records within that year or decade. See the screenshots above for an example.
The Snapshots feature recently added a citations timeline graph, which is similar to the citations line graph viewable through the main citations window for each record (see the section Viewing Citations Graphically for more information on this feature). Based on the citation data for the first 250 results, the citation graph displays the number of occurrences of backwards and forwards citations over a period of years. The graph can be saved as PDF, XLS spreadsheet, saved as a JPEG image, or printed.
The snapshot window also now includes a link to a Visualize tool, which uses the Visual Explorer Tool to create interactive visualization wheels of top keywords and IPC classes and a graphic display of top applicants. The user can expand or minimize the sections in the keyword/IPC interactive wheels by clicking on the pink arrows.
Editor's Note:The visualize option in the Snapshots window doesn't have the same utility of the Visual Explorer tool accessible through the search history page, since the visual wheels and displays in the snapshot window don't allow the user to view a hit list of relavent results by selecting the keywords/classes/applicants listed in the visualizations.
The Classification and Assignee Statistical Analysis Features
The statistical analysis features are accessed from the search history page, as shown in the figure below.
After the initial query is conducted, the statistical analysis function can be selected from the “more…” menu for that query on the search history page. The user must then select what topic they are interested in: classification (choose from only one of the following: IPC, ECLA, US, or F-Term), assignee, keywords, or advanced analysis. Each analysis option is available via individual tabs at the top of the work area (these tabs can be seen in the screenshot below). The keyword analysis and advanced statistical analysis features are discussed in further detail later in this section of the article.
The assignee and classification analysis options can analyze up to 5000 records, and can display all the data points in the resulting analysis; however, the default settings are to analyze the first 250 records, and show the top 10 results. The user may also select to have a Microsoft Excel CSV file generated with the data points and occurrence frequency. No other export formats are available from the analysis page; however, a user with a little patience can generate graphs from Microsoft Excel or another program using the full export functions available from the hit list. (for more information about exporting data in PatBase, see Export Functions).
For classification analysis, the user must choose which classification system to conduct the analysis on: IPC, ECLA, US, and F-Term classifications are available. Once the analysis function is completed, a list of data points and frequency of occurrence is generated along with a graphical pie-chart representation of the data. In the classification analysis, all the resulting classes are hyperlinked, so that classification definitions can be called up via a separate window. The ability to review classification definitions from this page is convenient, because the next step in the process (which also occurs from this page) is that the user may select classifications of particular interest, and conduct a search on these right from the analysis page.
The end result of a classification analysis can be shown as a list of the top classes by frequency, a list of top classes by frequency and subdivided by the various levels of IPC class hierarchy, with definitions included in the chart (as in the screenshot below), and a graph of these results (either bar or pie style). Users may select one or more of these classifications to make up a new search query through the "Search selected classes" link. A link to advanced statistical analysis feature for classifications is also provided (more about this feature is discussed in the advanced statistical analysis section).
The end result of assignee analysis is a list of the top assignees, and a pie chart showing their frequency. In the example below, a company has patents with multiple company name variations, leading to an inaccurate frequency count and misleading results.
Editor's Note:Like the analysis features available on other systems, PatBase assignee analysis should never be relied upon for serious analysis projects. The analysis is performed on assignee data from patent offices, and while some effort has been made to standardize this data, the result is far from perfect. In the figure above, the company name Ciba Geigy appears in three different forms. Therefore, while this feature can provide some casual insight into the companies that hold IP in the field, it should never be relied upon as accurate. As with any other assignee count feature in any other patent search tool, this feature is best used to give an at-a-glance impression, and serious assginee analysis projects should always involve advanced data cleaning efforts.
Keyword Analysis Tool
The VizPat Keyword Analysis is a tool that helps searchers determine the most frequently occurring keywords within a given set of patents, and to display those results in table or graphical form. The initial user interface for this tool is similar to the analysis tools described above: users input the number of records to be analyzed, number of top keywords to display, and which text fields within the documents to analyze. The system can analyze up to 2,000 records, and display up to 100 of the top keywords occurring within those records. The text field options from within the patent family record which can be analyzed include: title, abstract and claims; title and abstract only; claims only; US claims only; DE claims only; or FR claims only. The last option in this dialog window allows the user to automatically generate a CSV file of the analysis results.
The tool returns a hit list of the keywords and their frequency that may be manipulated further. One may delete keywords that are not particularly useful or relevant to the analysis, and/or merge related keywords together. In addition, users may also select one or more keywords of interest from this list to create a new search string (this is done by selecting the appropriate keywords via checkbox and choosing "Search Selected," as seen in the figure below). Users can even graph the results of their keyword analysis (as discussed further in the next paragraph). In the figure below, several keyword terms were deleted from the list by the user; these deleted terms appear in the message that can be seen at the top of the figure.
The VizPat Advanced Text Analysis tool does not rely on any automated linguistic analysis, as in text mining. Rather, the tool relies on the concept that frequently occurring word pairs within a defined proximity can represent major concepts expressed in a patent document. Choosing the Advanced Text Analysis option organizes the keyword hits from the initial analysis into the most frequently occurring word pairs in the data set, and a graph of the results will also become available. Before initiating the analysis, users must select the maximum allowed distance between two words before the system will consider them a "pair:" the drop-down menu labeled "proximity" in the figure below must be used to set this distance before generating the Advanced Text Analysis window.
The figure below shows the results of Advanced Text analysis: each word pair represents a set of words that occur within the allowed proximity distance, and the number next to each pair shows the frequency it occurs within the analyzed documents.
The "View Graph" button in the figure above will generate a graphical representation of the data above, where the X axis represents the frequency of the word pair in the data set, and the size of the dot for each word pair also represents its frequency. The Y axis represents the "rank" of the word pair, which is calculated by considering the number of other times each word in the word pair appears in another set (more on this below).
The rank of a word set in the graph is determined by the other words in the analysis. For example, in the following set:[2]
| Word Pair | Hits | Rank |
| fluid,(valve or pressure) | 20 | 7 |
| direction,(valve or pressure) | 9 | 6 |
| flow,(valve or pressure) | 7 | 6 |
| hydraulic,(valve or pressure) | 23 | 6 |
| direction,fluid | 9 | 5 |
| flow,fluid | 23 | 5 |
| fluid,hydraulic | 17 | 5 |
| direction,flow | 7 | 4 |
| chamber,(valve or pressure) | 17 | 4 |
| cylinder,hydraulic | 13 | 2 |
Consider the pair (cylinder, hydraulic). The word "hydraulic" appears in the set 2 other times, whereas the word "cylinder" does not appear anywhere else in the set. The rank is calculated by adding these two numbers together; 2 + 0 = 2, the overall rank of the pair.
Editor's Note:Because of the underlying concept of analyzing word pairs, the Advanced Text Analysis tool is not always suited for analyzing highly mechanical inventions, or any field where word pairs cannot accurately describe inventive concepts. For example, a sample search on a protective helmet returned word pairs such as "inner, outer" and "safety, helmet." However, despite the inherent limitations, users may still appreciate the expansion of their analysis toolbox. Users comparing tools may find it interesting that PatBase now does provide a "keyword clustering" tool through their Visual Explorer tool, like those offered in Thomson Reuters products such as Delphion and Thomson Innovation.
In addition to the problems with the concept of the Advanced Text Analysis tool, one functional weakness is that it does not offer a way to view the individual patent documents that contain a certain word pair – the graphs are not "clickable" or interactive to drill down into the underlying patent documents.
One practical problem with the tool is that after deleting or merging keyword terms, users cannot undo any of these changes – the analysis has to be re-run from scratch to recover those terms. In addition, if the "generate CSV" option is not selected before the launch of the keyword analysis, the user must go back to this dialog window and re-execute the analysis task to generate the file. This can have a small downside since the analysis may take another round of processing time.
Finally, users should be aware that keyword analysis projects will run slowly when a large number of results are processed (the limit is 2,000 documents and the top 100 keywords). The time will vary depending upon internet connection, browser configurations, and how many records are being processed. This applies to any of the PatBase analysis tools.
The Advanced Statistical Analysis Tool
The beta version of the Advanced Statistical Analysis tool, introduced in 2007, was renamed VizPat Advanced Statistical Analysis tool in May 2008. This tool performs essentially the same function as those described above, but it adds some advanced options, including the ability to manually clean data by merging duplicate data points. Analysis can be performed on classification, assignee, inventor, country and date (only one may be performed at a time). All of these options can handle up to 5,000 initial records, and can display all unique data points. In addition, some of these features can be used to generate a CSV file of the analysis results.
The figure below shows the advanced analysis dialog window for the assignee option. Hovering the cursor over the yellow info circles provides more information about each field (more information about the unique fields in each dialog window is presented at the end of this section).
Once the data set is cleaned, multiple graphs, including 3D graphs in the style of a Microsoft Excel Pivot chart, can be generated using the advanced features. The 3D landscape graph can be “grabbed” and rotated the way a pivot chart can, by holding the mouse and rotating the graph. Good perspectives, once achieved, can be downloaded as .jpg image files. The graphing options and other options unique to specific data types are discussed below. For all except the country analysis, options include graphs with the following X-axes: earliest priority year, earliest publication year, international (IPC) classification, family countries, and earliest priority country.
Assignee analysis: One new feature added to the advanced analysis option is the “inventor remove” option, which, when selected, apparently removes any individual inventor names that are listed as patent assignees (unless there is only one assignee that is also listed as the inventor).[2] In addition, in this module, users can choose whether to conduct the analysis on only the "first occurring" assignee in the Patbase family record, or all assignees.
Editor's Note:It is unclear whether the "first occurring" assignee is determined by publication data, or by some other measure such as alphabetical order. A PatBase representative clarified that the "first occurring" assignee is determined by alphabetical order.[3]
Classification analysis: The advanced classification analysis does not offer any new features except the ability to clean and merge duplicate data points. This feature is not yet available for F-term classifications.
Inventor analysis: This feature analyzes all the inventors listed on each record (not just the first listed). Analyzing inventor data for 500 family records generated over 2,000 unique inventor names listed in the results set.
Country analysis: The title is ambiguous, but this feature will analyze the country of publication. For this data element, the advanced data cleaning features are not needed: country codes are specifically designed not to be duplicative or ambiguous. Perhaps the developers of this feature thought that users would merge countries with low frequency counts, to create an “other” category. Graphing options for the X-axis for country analysis only include earliest priority year, and earliest publication year.
Date analysis: This feature analyzes the year of each record. Users can select the date type (earliest priority, earliest publication, latest priority, or latest publication) and the date length (YYYY, YYYYMM, or YYYYMMDD).[2] Graphing options on the X-axis include only international classification, family countries, and earliest priority country.
Visual Explorer Tool
The Visual Explorer tool, which was integrated into PatBase in October 2011,[1] is a "clustering and visualization tool" that allows users to view top keywords, applicants, and IPC classes (taken from the first 2000 records within a results set).[2] The tool is accessible through the search history page under the "more..." option beside each search query (as illustrated in the screenshot below).
Through the Visual Explorer tool, users can choose from four visualization options, according to the PatBase Manual:[2]
- Keywords (default view) - A dynamic wheel is displayed with segments containing keywords. The size of the segment is proportional to the breakdown of keywords within the results set. The greater the occurrence of a keyword: the larger the segment within the wheel. Additional tiers going out from the centre indicate an additional breakdown of keywords within a segment. Wherever a pink arrow occurs, this indicates there are additional smaller segments and clicking on the arrow itself will expand the wheel to display these smaller segments. As you mouse over segments, corresponding text will also appear in larger fonts within the centre of the wheel. Click on a segment within the dynamic wheel containing keyword(s) of interest to drill down to the relevant records.
- Keywords+ - This provides a visual breakdown of keywords occurring within the Title and Abstract, combined with the complete definition of the main IPCs occurring in the results set. (Like "keywords," this visualization is displayed as an interactive wheel.)
- Applicants - This provides a visual breakdown of Assignees occurring within the results set (inventor and assignee duplicates are automatically removed). (This visualization appears as a large box, with colored segments representing a particular applicant.)
- IPC Class - This provides a visual breakdown by the International Patent Classification (IPC) system within the results set. (Like "keywords," this visualization is displayed as an interactive wheel.)
When the user selects a keyword, IPC term, or applicant from the visualization, they are automatically taken to a hit list that displays a set of records from the original record set that correspond with the selected criteria. The user will see the term " [VISUAL EXPLORER]: <criteria>" displayed at the end of the query, and users have the option through the hit list menu to add all displayed search results to the search history. If the user selects to add the Visual Explorer results to the search history, the resulting query will simply be a list of all family record numbers that appeared in the result set. The user can return to the Visual Explorer visualization from this hit list by selecting the icon below the search query "Return to Visual Explorer."
Editor's Note:The Visual Explorer tool is similar to the Snapshots feature in that it only conducts analysis on a limited number of results (2,000), and it may be useful for identifying new keywords, IPC codes, and applicant names to add to the search query. It also acts as a keyword clustering tool, since it organizes related top keywords into a graphical display, with more common keywords displayed in a larger section of the wheel. It is also useful that users can view documents related to the keywords and save these results in the search history for further manipulation.
Downsides of the Visual Explorer tool include that the tool offers no manual data cleaning options, and the visualizations can't be exported or saved in any way.
Sources
- ↑ 1.0 1.1 "PatBase user news." PatBase website (restricted), http://www.patbase.com/wnewinfo.asp?i=173. Accessed December 9, 2011.
- ↑ 2.0 2.1 2.2 2.3 2.4 "PatBase Manual." PatBase website (restricted), http://www.patbase.com/Manual.pdf . Accessed December 9, 2011.
- ↑ Email correspondence with PatBase representative. Received December 20, 2011.


