Report:Thomson Innovation/Viewing Results/Analyzing Results/Text Clustering
From Intellogist
| Report | Patent Coverage Map | Ratings | Comments |
| This search system report was created by the Intellogist Team and is available for viewing only. If you'd like to share your knowledge on Intellogist, please visit the Best Practices, Glossary, or Community Reports pages. If you are a registered user and would like to be notified of any substantial changes to this report, you may place a "watch" on the Revisions page, which is the last page listed on the table of contents. To learn more about using the Intellogist "watchlist," see the Watchlist Help page. |
|
![]() ![]() |
|
Text Clustering
Note: The analysis features of Thomson Innovation are described from a prior art searching perspective in this report.
Like an older Thomson search system, Delphion, Innovation offers a basic text mining analysis feature that creates patent "clusters" according to shared keywords. The theory behind this tool is that keyword analysis of the patent text can automatically group patent (or literature) documents around shared concepts. The clustering tool is only available through an Analyst-type subscription to Thomson Innovation (to learn more about Innovation subscription types, see Pricing Policy). The text clustering option is available in the Analyst subscription under the "Analyze" drop-down menu on the Result Set toolbar. As with other features from this menu bar, users have the option to select only certain records for analysis, or to work with the entire data set. According to the user materials, up to 10,000 patent records or literature records can be analyzed by the clustering tool at once.[1]
Once the Text Clustering option is chosen, users are presented with a dialog window where they can select which portions of the patent text they wish to include in the analysis. Note that Thomson Innovation allows users to choose any or all standard fields of the patent text to include in the analysis, including DWPI data fields (title and abstract).
Editor's Note:As the Thomson Innovation help materials emphasize, the ability to choose DWPI fields for analysis can result in cleaner, more cogent analysis results. This is because Derwent titles and abstracts are re-written using standard industry language, whereas original patent authors can be "their own lexicographers" and choose unusual or non-standard language to describe their inventions. Original patent titles and abstracts are also sometimes vague, short, or otherwise unhelpful.
In general, the editors agree with the logic of Thomson's assertion that using DWPI fields in text analysis can be very helpful. Although there are some downsides to searching Derwent data (each patent family receives only one abstract, users must accept the error rate of human indexers, etc.), for a text analysis project it seems that using Derwent data can provide more coherent results. As long as users can accept the extra risk involved, it could be well worth it to utilize the Derwent data.
The results of a text clustering analysis are shown below. The pane to the left shows each individual cluster represented by extracted keyword terms. (Hovering the cursor over an individual cluster will display an extended list of terms included in the analysis.) Patents within any cluster can be loaded in the right side pane. Users may select any individual record for closer inspection. Note that the full Result Set menu bar is also available to allow users to save, create alerts, add records to the marked list, print, export, download, or conduct further analysis on the results in a single cluster. In other words, it is possible to conduct a second level of text analysis for the records in a single cluster of interest.
It appears that each record from the original data set is placed into only one cluster. In the figure above, the document counts in each cluster add up to a total of 347, the number of records in the initial data set.
Each cluster is shown next to an "expand" icon. Expanding a single cluster will show sub-level analysis within the chosen cluster. These sub-groups appear to be represented by the keywords that they have in common within the larger cluster. When a cluster is selected, a box below the cluster menu will display details about the cluster, including the additional keywords used to construct it.
To learn more about saving text clustering projects, see Saving Analysis Results.
Editor's Note:Innovation's clustering feature offers several improvements over Delphion's initial feature: It allows users to analyze the entire patent text and/or the DWPI fields if so desired, and it provides a more functional interface for viewing individual patent documents within a cluster.
Notably, Innovation does not offer any ability to visualize these clusters, or generate similarity scores between these clusters. This is an aspect of the Delphion clustering feature that was not implemented; instead, the ThemeScape tool, discussed in the next section of this article, can be used to visualize concept relationships over large data sets.
Sources
- ↑ "Text Clustering." Thomson Innovation website, http://www.thomsoninnovation.com/tip-innovation/support/help/clustering.htm. Accessed September 18, 2012.



